Effects of Altering Levothyroxine (L-T4) Doses on Quality of Life, Mood, and Cognition in L-T4 Treated Subjects

Effects of Altering Levothyroxine (L-T4) Doses on Quality of Life, Mood, and Cognition in L-T4... Abstract Background The brain is a critical target organ for thyroid hormone, but it is unclear whether variations in thyroid function within and near the reference range affect quality of life, mood, or cognition. Methods A total of 138 subjects with levothyroxine (L-T4)-treated hypothyroidism and normal thyrotropin (TSH) levels underwent measures of quality of life (36-Item Short Form Health Survey, Underactive Thyroid-Dependent Quality of Life Questionnaire), mood (Profile of Mood States, Affective Lability Scale), and cognition (executive function, memory). They were then randomly assigned to receive an unchanged, higher, or lower L-T4 dose in double-blind fashion, targeting one of three TSH ranges (0.34 to 2.50, 2.51 to 5.60, or 5.61 to 12.0 mU/L). Doses were adjusted every 6 weeks based on TSH levels. Baseline measures were reassessed at 6 months. Results At the end of the study, by intention to treat, mean L-T4 doses were 1.50 ± 0.07, 1.32 ± 0.07, and 0.78 ± 0.08 μg/kg (P < 0.001), and mean TSH levels were 1.85 ± 0.25, 3.93 ± 0.38, and 9.49 ± 0.80 mU/L (P < 0.001), respectively, in the three arms. There were minor differences in a few outcomes between the three arms, which were no longer significant after correction for multiple comparisons. Subjects could not ascertain how their L-T4 doses had been adjusted (P = 0.55) but preferred L-T4 doses they perceived to be higher (P < 0.001). Conclusions Altering L-T4 doses in hypothyroid subjects to vary TSH levels in and near the reference range does not affect quality of life, mood, or cognition. L-T4-treated subjects prefer perceived higher L-T4 doses despite a lack of objective benefit. Adjusting L-T4 doses in hypothyroid patients based on symptoms in these areas may not result in significant clinical improvement. Overt hypothyroidism interferes with brain functions (1), but effects of variations in thyroid function within and near the reference range are less clear. Observational studies of this issue have been inconsistent, and a few randomized, blinded intervention studies have been negative (1–11). In the absence of consensus, many patients with mild thyrotropin (TSH) elevations are treated with levothyroxine (L-T4) to improve neurocognitive symptoms, and L-T4 doses are often increased to treat persistent symptoms. We recruited hypothyroid subjects treated with L-T4 who underwent testing for health status, mood, and cognitive function. We targeted cognitive domains preferentially affected by mild thyroid dysfunction (memory and executive function) (1). We then adjusted subjects’ L-T4 doses in a blinded fashion over 6 months to achieve one of three TSH ranges (low-normal, high-normal, or mildly elevated), and repeated the tests. We hypothesized that altering TSH levels in these ranges would affect quality of life, mood, memory, and executive function. Experimental Subjects A total of 197 hypothyroid subjects receiving L-T4 monotherapy were recruited from the authors’ clinics, through review of electronic health records, and by flyers. All were diagnosed as adults and had past elevated TSH levels. L-T4 doses were stable for ≥3 months. None had acute or chronic illnesses or were on medications that affect thyroid hormone levels, mood, or cognition. Stable doses of oral contraceptive or estrogen therapy were allowed. Testing was done during the first 14 days after onset of menstrual bleeding or an oral contraceptive cycle in premenopausal women. Materials and Methods Experimental design The protocol was approved by the Oregon Health & Science University (OHSU) Institutional Review Board. Subjects gave written informed consent. Screening visit Subjects were screened for general health, medicines, thyroid status, and mood or cognitive disorders by history, physical examination, and laboratory testing. General intelligence was estimated by the Wechsler Adult Intelligence Scale–Revised (WAIS-R) Vocabulary subtest (12). Run-in visits Subjects taking branded L-T4 with normal screening TSH levels proceeded directly to the baseline visit. Subjects who had abnormal screening TSH levels or were taking generic L-T4 were placed on branded L-T4 and underwent run-in visits every 6 weeks with L-T4 dose adjustments until doses were stable, with normal TSH levels for 3 months. Baseline visit Within 6 weeks of the screening or final run-in visit, subjects returned for a 4-hour baseline visit. Subjects refrained from taking their L-T4 dose that morning. Serum TSH, free thyroxine (fT4), and free triiodothyronine (fT3) levels were obtained. Subjects self-completed the following validated surveys: Billewicz scale of hypothyroid-related symptoms (13) Underactive Thyroid-Dependent Quality of Life Questionnaire, which measures the impact of hypothyroidism on quality of life (14) 36-Item Short Form Health Survey (SF-36), a general health questionnaire (15) Profile of Mood States (POMS), a mood questionnaire (16) Affective Lability Scale, where subjects rate the tendency of their moods to fluctuate (17) Cognitive tests were administered by a single experienced research assistant. Executive function Attention and concentration Letter Cancellation Test. The subject was given a sheet of paper with 6 lines of 52 letters in random sequence and instructed to circle two specified target letters as quickly as possible. The score was the number of errors and time needed (18). Cognitive flexibility Trail Making Test. The subject connected circles on a sheet of paper as quickly as possible. In Part A, the subject drew lines to connect numbered circles in ascending order. In Part B, the subject drew lines to connect circles in ascending order, alternating between numbers and letters. The score was the number of errors and time needed (19). Decision making Iowa Gambling Task. Four decks of cards were shown face down on a computer screen. The subject chose cards from any deck, resulting in the gain or loss of money. The subject was unaware that two decks were advantageous (small gains, smaller losses), and two were disadvantageous (large gains, larger losses). The subject’s choices were classified as advantageous (X) or disadvantageous (Y), with a net score of X − Y, over 5 trials of 100 cards each (20). Working memory N-Back test. A series of letters was presented one at a time on a computer screen. Subjects responded each time a letter appeared that they had seen on the previous screen (1-back). The task was repeated with intervening letters imposed while the subjects had to hold in mind letters that had appeared 2-back and then 3-back. The score was the total number correct on target and the total number incorrect nontarget for each condition (21). Subject-Ordered Pointing. Subjects viewed a series of computer screens that presented abstract drawings (6, 8, 10, or 12 per screen). Each screen in a set showed the same array of drawings but in a different spatial arrangement. The subject indicated one drawing per screen, avoiding the same drawing on subsequent screens. Subjects erred when they chose a drawing that had been previously chosen. Each set was repeated three times. The score was the total number of errors across each screen set (22). Declarative memory Paragraph Recall (verbal memory) Subjects were read a brief story and verbally recalled it immediately and after 30 minutes. The score was the total number of story elements recalled at each interval (23). Motor learning Pursuit Rotor Subjects held a photosensitive wand to maintain contact with a 2-cm light disk rotating on a turntable (Lafayette Instrument Company, Lafayette, IN). Two blocks of eight 20-second trials were administered, with a 20-second rest after each trial and a 60-second rest period after four trials. After a 30-minute interval, the two blocks were repeated. The score was the mean total time the stylus remained on target (24). Motor Sequence Learning Test The subject memorized two keypress sequences, each associated with a letter of the alphabet. As soon as that letter appeared on the computer screen, the subject performed the appropriate sequence as quickly as possible. Subjects performed 10 blocks of 18 trials each. The score was the total movement time (time from character presentation to completion of the sequence) (25). Randomization Immediately after the baseline visit, subjects were randomly assigned to one of three arms: low-normal TSH (0.34 to 2.50 mU/L), high-normal TSH (2.51 to 5.60 mU/L), or mildly elevated TSH (5.60 to 12.0 mU/L). These were based on the OHSU TSH assay reference range, recent debate over restricting the upper limit to 2.50 mU/L (26), and our intention to restrict elevated TSH levels to the subclinical hypothyroid range. Randomization was stratified by whether the subject’s TSH was low- or high-normal. L-T4 dosing Taking into account baseline TSH levels, the dispensing physician (K.G.S.) initially determined whether subjects should continue their usual L-T4 dose or receive a different dose to achieve the assigned target TSH ranges. If a different dose was indicated, the subject’s usual dose was altered by 25 to 50 μg, depending on the difference between the initial and target TSH levels. The principal investigator (M.H.S.), research assistants, and subjects were unaware of the treatment assignment or L-T4 doses. The OHSU research pharmacy dispensed 6-week supplies of L-T4 pills in opaque gel capsules to maintain blinding. Interim visits At 6, 12, and 18 weeks, subjects returned for brief visits. The principal investigator assessed clinical effects and determined whether the subject could comfortably continue the study. TSH levels from these visits were reviewed by K.G.S., who adjusted L-T4 doses if the interim TSH level was not in the target range. L-T4 doses were adjusted by 12.5 to 50 μg depending on the difference between the interim and target TSH levels, and the research pharmacy dispensed new 6-week supplies. Additional interim visits were allowed if the TSH level was not in the target range at 18 weeks. Once the TSH was in the target range, no further interim visits were scheduled, and the subject proceeded to the end-of-study visit. End-of-study visit Approximately 6 weeks after the final interim visit, baseline measurements were repeated. At this visit, TSH, fT4, and fT3 levels were measured, and this TSH level was subsequently used to assign subjects to actual end-of-study TSH arms for the purposes of data analysis. The subject was then placed back on his or her usual L-T4 dose, or a dose that led to better TSH control during the study, per subject preference. Analytic methods TSH was measured by immunochemiluminometric assay (Beckman Coulter): functional sensitivity 0.02 mU/L, normal range 0.34 to 5.60 mU/L, and interassay coefficient of variation (CV) 5% at 0.70 mU/L. fT4 was measured by direct equilibrium dialysis (Quest Diagnostics): sensitivity 0.08 ng/dL, normal range 0.8 to 2.7 ng/dL, and interassay CV 6.8% at 0.3 ng/dL and 1.6% at 3.8 ng/dL. FT3 was measured by tracer dialysis (Quest Diagnostics): sensitivity 25 pg/dL, normal range 210 to 440 pg/dL, and interassay CV 4%. TSH levels were measured at the time of testing, with stable assay characteristics during the study. fT4 and fT3 levels were batched and analyzed at the end of the study. All samples were run in duplicate. Statistical methods Differences between arms for continuous measures were analyzed with multiple linear regression models adjusted for age, sex and estrogen status, years of education, WAIS-R score, baseline body mass index (BMI), change in BMI, baseline TSH (low- vs high-normal), standard deviation of TSH values at interim and last visits, baseline LT-4 dose (μg/kg), time on LT-4, time on LT-4 dose, and baseline value of the outcome variable. Binary measures were analyzed with multiple logistic regression models that adjusted for baseline TSH (low- vs high-normal) and baseline values of the outcomes because of the more limited nature of the binary data. For outcomes with significant differences between arms, the Tukey multiple comparison procedure was used to determine which arms were significantly different and adjust P values for pairwise differences between arms. Because not all outcomes were independent, multiple testing P value adjustments were made for groups of outcomes that included a significant outcome. These outcomes included the SF-36 summary and subscales (10 outcomes total), the POMS subscales (6 outcomes total), N-Back (correct and incorrect variables for 1-back, 2-back, 3-back; 6 outcomes total), and Letter Cancellation Test (time and error, 2 outcomes total). Analyses were conducted as intention-to-treat and by the actual TSH arms subjects achieved at the end of the study. We also examined relationships between outcomes and TSH, fT4, or fT3 by using the same regression analyses but substituting, in separate models, the selected hormone for the categorical arms variable. All analyses were conducted in R version 3.3.2 (R Foundation for Statistical Computing) (27). Results Demographic, clinical, and thyroid function parameters Figure 1 provides a flowchart of the study design and subject enrollment. Of the 197 subjects initially screened, 24 were excluded because of abnormal laboratory tests [low-density lipoproteins >160 mg/dL (n = 11), glucose >120 mg/dL (n = 2), elevated serum calcium (n = 1)], abnormal electrocardiogram (n = 3), TSH out of range (n = 5), or medical issues (n = 2). Fifty subjects were taking branded L-T4 and had normal screening TSH levels, and they proceeded directly to the baseline visit. One hundred twenty-three subjects were taking generic L-T4 or had abnormal screening TSH levels and proceeded to the run-in. One hundred fifty-one subjects completed the baseline visit. Twenty-two subjects withdrew during the run-in [personal issues (n = 17), started other medications (n = 2), started weight loss diet (n = 1), medical issues (n = 2)]. Thirteen subjects withdrew before the final visit [personal issues (n = 5), medical issues (n = 6), pregnancy (n = 1), started weight loss diet (n = 1)]. Seven of these withdrew before the 6-week interim visit, 4 before the 12-week interim visit, and 2 before the 18-week interim visit. Subjects who were excluded or withdrew were not different from the study population in demographic or clinical attributes. Figure 1. View largeDownload slide Flowchart of study design and enrollment. Figure 1. View largeDownload slide Flowchart of study design and enrollment. A total of 138 subjects completed the study (125 female, 13 male). They were aged 27 to 70 years and were receiving L-T4 for primary hypothyroidism (n = 112), hypothyroidism after iodine-131 therapy for Graves disease (n = 17), postpartum thyroiditis leading to permanent hypothyroidism (n = 3), or thyroid surgery (n = 6). They had received L-T4 for 5 months to 50 years (mean 12 years). Mean time on current L-T4 dose was 1.6 years. Baseline data from these subjects have been published (28). During the run-in, 92 subjects (67%) switched from generic to branded L-T4, and 36 (26%) needed L-T4 dose adjustment. Percentages of subjects needing a run-in or dose adjustment did not differ between the three arms (P = 0.50). At baseline, 87 subjects (63%) had low-normal TSH and 51 (37%) had high-normal TSH levels. Nineteen subjects (14%) did not need L-T4 dose adjustments at interim visits, and 119 (86%) needed 1 to 5 additional dose adjustments (mean 2.1). Forty-five subjects (33%) did not achieve their intended target TSH ranges (17%, 64%, and 16% in the low-normal, high-normal, and mildly elevated TSH arms, respectively). It was particularly difficult to maintain subjects in the high-normal TSH arm, because small changes in TSH levels near the lower or upper cutoffs of this arm moved subjects into one of the other two arms. For this reason, we conducted two separate analyses, one as intention-to-treat by randomized arm and one based on actual TSH levels at the end-of-study visit. Results are presented for the intention to treat analysis first, followed by the actual end arm analysis. By intention to treat, subjects in the three arms did not differ in age, WAIS-R score, years in school, sex, estrogen status, ethnicity, BMI, or duration at current L-T4 dose (Table 1). Duration of L-T4 treatment was longer in the high-normal TSH arm (P < 0.001). Mean L-T4 doses at the end of the study were progressively lower in the three arms (1.50 ± 0.07, 1.32 ± 0.07, and 0.78 ± 0.08 μg/kg/day, respectively, P < 0.001), whereas mean TSH levels were progressively higher (1.85 ± 0.25, 3.93 ± 0.38, and 9.49 ± 0.80 mU/L, P < 0.001). Mean fT4 levels were lower in the mildly elevated TSH arm (1.79 ± 0.06, 1.64 ± 0.07, and 1.34 ± 0.05 ng/dL, respectively, P < 0.001), whereas mean fT3 levels were not significantly different between the three arms (201.4 ± 6.0, 191.4 ± 6.2, and 184.1 ± 6.6 pg/dL, P = 0.15). Seventy-two subjects (52%) had low baseline fT3 levels (118 to 209 pg/dL). At the end of the study, 28 subjects in the low-normal TSH arm (61%), 34 in the high-normal TSH arm (72%), and 34 in the mildly elevated TSH arm (76%) had low fT3 levels (82 to 209 pg/dL). Table 1. Clinical Parameters and Thyroid Function Tests at Baseline and End of Study, Analyzed as Intention to Treat End of Study Baseline Arm 1 Low Normal TSH Arm 2 High Normal TSH Arm 3 Mildly Elevated TSH P No. of subjects 138 46 47 45 Age, y 49.2 ± 1 49.5 ± 1.7 50.9 ± 1.8 49.3 ± 1.6 0.77 WAIS-R scorea 10.9 ± 0.2 10.8 ± 0.3 11.0 ± 0.3 10.8 ± 0.3 0.88 Years in schoola 15.9 ± 0.3 15.6 ± 0.5 16.1 ± 0.4 15.9 ± 0.4 0.74 Sexa 91% Female 41 (89.1%) 41 (87.2%) 43 (95.6%) 0.45 9% Male 5 (10.9%) 6 (12.8%) 2 (4.4%) Estrogen statusa 9% Male 5 (10.9%) 6 (12.8%) 2 (4.4%) 0.50 39% Prenone 19 (41.3%) 15 (31.9%) 20 (44.4%) 9% Preon 4 (8.7%) 5 (10.6%) 4 (8.9%) 38% Postnone 18 (39.1%) 19 (40.4%) 15 (33.3%) 4% Poston 0 (0%) 2 (4.3%) 4 (8.9%) Ethnicitya 92% White 44 (95.7%) 41 (87.2%) 42 (93.3%) 0.35 8% Other 2 (4.3%) 6 (12.8%) 3 (6.7%) BMI, kg/m2 27.8 ± 0.5 28.6 ± 0.8 27.3 ± 0.8 27.6 ± 1.0 0.58 L-T4 duration of treatment, ya 11.9 ± 0.8 9.9 ± 1.3 15.9 ± 1.7 9.7 ± 0.9 <0.001b,c L-T4 duration at current dose, ya 1.63 ± 0.19 1.75 ± 0.35 1.50 ± 0.18 1.63 ± 0.40 0.86 L-T4 dose, μg/kg 1.44 ± 0.04 1.50 ± 0.07 1.32 ± 0.07 0.78 ± 0.08 <0.001c,d L-T4 dose change, μg/kg 0.14 ± 0.02 −0.21 ± 0.03 −0.64 ± 0.05 <0.001b,c,d TSH, mu/L 2.21 ± 0.13 1.85 ± 0.25 3.93 ± 0.38 9.49 ± 0.80 <0.001b,c,d TSH change, mu/L −0.18 ± 0.33 1.60 ± 0.42 7.23 ± 0.86 <0.001c,d Free T4, ng/dL 1.67 ± 0.03 1.79 ± 0.06 1.64 ± 0.07 1.34 ± 0.05 <0.001c,d Free T4, ng/dL change 0.13 ± 0.07 −0.04 ± 0.07 −0.32 ± 0.06 <0.001c,d Free T3, pg/dL 214 ± 4.2 201.4 ± 6.0 191.1 ± 6.2 184.1 ± 6.6 0.15 Free T3, pg/dL change −18.9 ± 9.3 −14.4 ± 6.9 −32.4 ± 7.9 0.27 End of Study Baseline Arm 1 Low Normal TSH Arm 2 High Normal TSH Arm 3 Mildly Elevated TSH P No. of subjects 138 46 47 45 Age, y 49.2 ± 1 49.5 ± 1.7 50.9 ± 1.8 49.3 ± 1.6 0.77 WAIS-R scorea 10.9 ± 0.2 10.8 ± 0.3 11.0 ± 0.3 10.8 ± 0.3 0.88 Years in schoola 15.9 ± 0.3 15.6 ± 0.5 16.1 ± 0.4 15.9 ± 0.4 0.74 Sexa 91% Female 41 (89.1%) 41 (87.2%) 43 (95.6%) 0.45 9% Male 5 (10.9%) 6 (12.8%) 2 (4.4%) Estrogen statusa 9% Male 5 (10.9%) 6 (12.8%) 2 (4.4%) 0.50 39% Prenone 19 (41.3%) 15 (31.9%) 20 (44.4%) 9% Preon 4 (8.7%) 5 (10.6%) 4 (8.9%) 38% Postnone 18 (39.1%) 19 (40.4%) 15 (33.3%) 4% Poston 0 (0%) 2 (4.3%) 4 (8.9%) Ethnicitya 92% White 44 (95.7%) 41 (87.2%) 42 (93.3%) 0.35 8% Other 2 (4.3%) 6 (12.8%) 3 (6.7%) BMI, kg/m2 27.8 ± 0.5 28.6 ± 0.8 27.3 ± 0.8 27.6 ± 1.0 0.58 L-T4 duration of treatment, ya 11.9 ± 0.8 9.9 ± 1.3 15.9 ± 1.7 9.7 ± 0.9 <0.001b,c L-T4 duration at current dose, ya 1.63 ± 0.19 1.75 ± 0.35 1.50 ± 0.18 1.63 ± 0.40 0.86 L-T4 dose, μg/kg 1.44 ± 0.04 1.50 ± 0.07 1.32 ± 0.07 0.78 ± 0.08 <0.001c,d L-T4 dose change, μg/kg 0.14 ± 0.02 −0.21 ± 0.03 −0.64 ± 0.05 <0.001b,c,d TSH, mu/L 2.21 ± 0.13 1.85 ± 0.25 3.93 ± 0.38 9.49 ± 0.80 <0.001b,c,d TSH change, mu/L −0.18 ± 0.33 1.60 ± 0.42 7.23 ± 0.86 <0.001c,d Free T4, ng/dL 1.67 ± 0.03 1.79 ± 0.06 1.64 ± 0.07 1.34 ± 0.05 <0.001c,d Free T4, ng/dL change 0.13 ± 0.07 −0.04 ± 0.07 −0.32 ± 0.06 <0.001c,d Free T3, pg/dL 214 ± 4.2 201.4 ± 6.0 191.1 ± 6.2 184.1 ± 6.6 0.15 Free T3, pg/dL change −18.9 ± 9.3 −14.4 ± 6.9 −32.4 ± 7.9 0.27 Values are mean ± standard error of the mean. Change variables represent the differences between end of study and baseline for each arm. Differences between arms were tested with analysis of variance, and follow-up post hoc Tukey multiple comparisons were used to determine which arms were significantly different at the 5% level. Abbreviations: Postnone, postmenopausal, no hormone treatment; Poston, postmenopausal on hormone treatment; Prenone, premenopausal, no hormone treatment; Preon, premenopausal on hormone treatment. a Values are at baseline for each arm. b Arm 1 vs Arm 2. c Arm 2 vs Arm 3. d Arm 1 vs Arm 3. View Large Table 1. Clinical Parameters and Thyroid Function Tests at Baseline and End of Study, Analyzed as Intention to Treat End of Study Baseline Arm 1 Low Normal TSH Arm 2 High Normal TSH Arm 3 Mildly Elevated TSH P No. of subjects 138 46 47 45 Age, y 49.2 ± 1 49.5 ± 1.7 50.9 ± 1.8 49.3 ± 1.6 0.77 WAIS-R scorea 10.9 ± 0.2 10.8 ± 0.3 11.0 ± 0.3 10.8 ± 0.3 0.88 Years in schoola 15.9 ± 0.3 15.6 ± 0.5 16.1 ± 0.4 15.9 ± 0.4 0.74 Sexa 91% Female 41 (89.1%) 41 (87.2%) 43 (95.6%) 0.45 9% Male 5 (10.9%) 6 (12.8%) 2 (4.4%) Estrogen statusa 9% Male 5 (10.9%) 6 (12.8%) 2 (4.4%) 0.50 39% Prenone 19 (41.3%) 15 (31.9%) 20 (44.4%) 9% Preon 4 (8.7%) 5 (10.6%) 4 (8.9%) 38% Postnone 18 (39.1%) 19 (40.4%) 15 (33.3%) 4% Poston 0 (0%) 2 (4.3%) 4 (8.9%) Ethnicitya 92% White 44 (95.7%) 41 (87.2%) 42 (93.3%) 0.35 8% Other 2 (4.3%) 6 (12.8%) 3 (6.7%) BMI, kg/m2 27.8 ± 0.5 28.6 ± 0.8 27.3 ± 0.8 27.6 ± 1.0 0.58 L-T4 duration of treatment, ya 11.9 ± 0.8 9.9 ± 1.3 15.9 ± 1.7 9.7 ± 0.9 <0.001b,c L-T4 duration at current dose, ya 1.63 ± 0.19 1.75 ± 0.35 1.50 ± 0.18 1.63 ± 0.40 0.86 L-T4 dose, μg/kg 1.44 ± 0.04 1.50 ± 0.07 1.32 ± 0.07 0.78 ± 0.08 <0.001c,d L-T4 dose change, μg/kg 0.14 ± 0.02 −0.21 ± 0.03 −0.64 ± 0.05 <0.001b,c,d TSH, mu/L 2.21 ± 0.13 1.85 ± 0.25 3.93 ± 0.38 9.49 ± 0.80 <0.001b,c,d TSH change, mu/L −0.18 ± 0.33 1.60 ± 0.42 7.23 ± 0.86 <0.001c,d Free T4, ng/dL 1.67 ± 0.03 1.79 ± 0.06 1.64 ± 0.07 1.34 ± 0.05 <0.001c,d Free T4, ng/dL change 0.13 ± 0.07 −0.04 ± 0.07 −0.32 ± 0.06 <0.001c,d Free T3, pg/dL 214 ± 4.2 201.4 ± 6.0 191.1 ± 6.2 184.1 ± 6.6 0.15 Free T3, pg/dL change −18.9 ± 9.3 −14.4 ± 6.9 −32.4 ± 7.9 0.27 End of Study Baseline Arm 1 Low Normal TSH Arm 2 High Normal TSH Arm 3 Mildly Elevated TSH P No. of subjects 138 46 47 45 Age, y 49.2 ± 1 49.5 ± 1.7 50.9 ± 1.8 49.3 ± 1.6 0.77 WAIS-R scorea 10.9 ± 0.2 10.8 ± 0.3 11.0 ± 0.3 10.8 ± 0.3 0.88 Years in schoola 15.9 ± 0.3 15.6 ± 0.5 16.1 ± 0.4 15.9 ± 0.4 0.74 Sexa 91% Female 41 (89.1%) 41 (87.2%) 43 (95.6%) 0.45 9% Male 5 (10.9%) 6 (12.8%) 2 (4.4%) Estrogen statusa 9% Male 5 (10.9%) 6 (12.8%) 2 (4.4%) 0.50 39% Prenone 19 (41.3%) 15 (31.9%) 20 (44.4%) 9% Preon 4 (8.7%) 5 (10.6%) 4 (8.9%) 38% Postnone 18 (39.1%) 19 (40.4%) 15 (33.3%) 4% Poston 0 (0%) 2 (4.3%) 4 (8.9%) Ethnicitya 92% White 44 (95.7%) 41 (87.2%) 42 (93.3%) 0.35 8% Other 2 (4.3%) 6 (12.8%) 3 (6.7%) BMI, kg/m2 27.8 ± 0.5 28.6 ± 0.8 27.3 ± 0.8 27.6 ± 1.0 0.58 L-T4 duration of treatment, ya 11.9 ± 0.8 9.9 ± 1.3 15.9 ± 1.7 9.7 ± 0.9 <0.001b,c L-T4 duration at current dose, ya 1.63 ± 0.19 1.75 ± 0.35 1.50 ± 0.18 1.63 ± 0.40 0.86 L-T4 dose, μg/kg 1.44 ± 0.04 1.50 ± 0.07 1.32 ± 0.07 0.78 ± 0.08 <0.001c,d L-T4 dose change, μg/kg 0.14 ± 0.02 −0.21 ± 0.03 −0.64 ± 0.05 <0.001b,c,d TSH, mu/L 2.21 ± 0.13 1.85 ± 0.25 3.93 ± 0.38 9.49 ± 0.80 <0.001b,c,d TSH change, mu/L −0.18 ± 0.33 1.60 ± 0.42 7.23 ± 0.86 <0.001c,d Free T4, ng/dL 1.67 ± 0.03 1.79 ± 0.06 1.64 ± 0.07 1.34 ± 0.05 <0.001c,d Free T4, ng/dL change 0.13 ± 0.07 −0.04 ± 0.07 −0.32 ± 0.06 <0.001c,d Free T3, pg/dL 214 ± 4.2 201.4 ± 6.0 191.1 ± 6.2 184.1 ± 6.6 0.15 Free T3, pg/dL change −18.9 ± 9.3 −14.4 ± 6.9 −32.4 ± 7.9 0.27 Values are mean ± standard error of the mean. Change variables represent the differences between end of study and baseline for each arm. Differences between arms were tested with analysis of variance, and follow-up post hoc Tukey multiple comparisons were used to determine which arms were significantly different at the 5% level. Abbreviations: Postnone, postmenopausal, no hormone treatment; Poston, postmenopausal on hormone treatment; Prenone, premenopausal, no hormone treatment; Preon, premenopausal on hormone treatment. a Values are at baseline for each arm. b Arm 1 vs Arm 2. c Arm 2 vs Arm 3. d Arm 1 vs Arm 3. View Large Health status and mood by intention to treat At the end of the study, SF-36 Physical Functioning and POMS anger subscales were higher in the high-normal TSH compared with the low-normal TSH arm (49% vs 26% high, P = 0.03; 4.9 ± 0.7 vs 3.6 ± 0.6, P = 0.03), but these differences were not significant after correction for multiple testing. There were no other differences between the three arms in health status or mood measures (Table 2). Analyzing TSH, fT4, and fT3 as continuous variables, the SF-36 Mental Health subscale decreased by 0.33 point for each 1-mU/L increase in TSH (P = 0.05). There were no significant correlations between TSH, fT4, or fT3 and other health status or mood measures (Table 3). Table 2. End-of-Study Health Status and Mood Measures for Each Arm, Analyzed by Intention to Treat Measure Arm 1 Low-Normal TSH Arm 2 High-Normal TSH Arm 3 Mildly Elevated TSH P Billewicz  Billewicz Score 2.8 ± 0.4 4.0 ± 0.5 4.1 ± 0.4 0.38 Thyroid Disease Questionnaire   Thyroid Disease Questionnaire weighted average −1.6 ± 0.2 −1.4 ± 0.2 −1.9 ± 0.3 0.53 SF-36  Mental component summary 41.0 ± 0.9 39.8 ± 1.1 39.4 ± 1.0 0.66  Physical component summary 48.5 ± 0.7 50.6 ± 0.7 49.2 ± 0.8 0.28  General health 62.2 ± 1.6 64.3 ± 1.7 62.6 ± 2.1 0.86  Mental health 68.6 ± 1.6 66.0 ± 1.7 65.1 ± 1.7 0.12  Vitality 45.6 ± 2.7 50.5 ± 2.5 44.6 ± 2.7 0.36  Bodily pain (BP)a 21% high 23% high 33% high 0.69  Physical functioning (PF)a 26% high 49% high 31% high 0.03b  Role physical (RP)a 62% high 79% high 55% high 0.08  Social functioning (SF)a 62% high 53% high 52% high 0.62  Role emotional (RE)a 74% high 74% high 62% high 0.43 POMSc  Anger 3.6 ± 0.6 4.9 ± 0.7 4.3 ± 0.6 0.03b  Confusion 6.1 ± 0.5 6.6 ± 0.5 6.0 ± 0.4 0.87  Depression 4.4 ± 0.9 5.2 ± 0.9 5.4 ± 0.8 0.83  Fatigue 7.4 ± 0.8 6.7 ± 0.9 6.9 ± 0.8 0.42  Tension 5.2 ± 0.5 6.7 ± 0.5 7.2 ± 0.7 0.22  Vigor 14.9 ± 0.9 16.8 ± 0.9 14.8 ± 0.9 0.83 Affective Lability Score  Bipolar 0.6 ± 0.1 0.7 ± 0.1 0.7 ± 0.1 0.42  Depression 0.9 ± 0.1 0.9 ± 0.1 1.0 ± 0.1 0.69  Elation 0.7 ± 0.1 0.8 ± 0.1 0.8 ± 0.1 0.89  Angerd 61% score >0 57% score >0 62% score >0 0.67  Anxietyd 67% score >0 83% score >0 84% score >0 0.21  Anxiety Depressiond 63% score >0 66% score >0 76% score >0 0.58 Measure Arm 1 Low-Normal TSH Arm 2 High-Normal TSH Arm 3 Mildly Elevated TSH P Billewicz  Billewicz Score 2.8 ± 0.4 4.0 ± 0.5 4.1 ± 0.4 0.38 Thyroid Disease Questionnaire   Thyroid Disease Questionnaire weighted average −1.6 ± 0.2 −1.4 ± 0.2 −1.9 ± 0.3 0.53 SF-36  Mental component summary 41.0 ± 0.9 39.8 ± 1.1 39.4 ± 1.0 0.66  Physical component summary 48.5 ± 0.7 50.6 ± 0.7 49.2 ± 0.8 0.28  General health 62.2 ± 1.6 64.3 ± 1.7 62.6 ± 2.1 0.86  Mental health 68.6 ± 1.6 66.0 ± 1.7 65.1 ± 1.7 0.12  Vitality 45.6 ± 2.7 50.5 ± 2.5 44.6 ± 2.7 0.36  Bodily pain (BP)a 21% high 23% high 33% high 0.69  Physical functioning (PF)a 26% high 49% high 31% high 0.03b  Role physical (RP)a 62% high 79% high 55% high 0.08  Social functioning (SF)a 62% high 53% high 52% high 0.62  Role emotional (RE)a 74% high 74% high 62% high 0.43 POMSc  Anger 3.6 ± 0.6 4.9 ± 0.7 4.3 ± 0.6 0.03b  Confusion 6.1 ± 0.5 6.6 ± 0.5 6.0 ± 0.4 0.87  Depression 4.4 ± 0.9 5.2 ± 0.9 5.4 ± 0.8 0.83  Fatigue 7.4 ± 0.8 6.7 ± 0.9 6.9 ± 0.8 0.42  Tension 5.2 ± 0.5 6.7 ± 0.5 7.2 ± 0.7 0.22  Vigor 14.9 ± 0.9 16.8 ± 0.9 14.8 ± 0.9 0.83 Affective Lability Score  Bipolar 0.6 ± 0.1 0.7 ± 0.1 0.7 ± 0.1 0.42  Depression 0.9 ± 0.1 0.9 ± 0.1 1.0 ± 0.1 0.69  Elation 0.7 ± 0.1 0.8 ± 0.1 0.8 ± 0.1 0.89  Angerd 61% score >0 57% score >0 62% score >0 0.67  Anxietyd 67% score >0 83% score >0 84% score >0 0.21  Anxiety Depressiond 63% score >0 66% score >0 76% score >0 0.58 Values are mean ± standard error of the mean. Significant differences between arms are shown in bold. P values for continuous outcomes were adjusted for age, years of education, WAIS-R score, BMI at baseline, change in BMI, sex and estrogen status, baseline TSH (low-normal vs high-normal), standard deviation of TSH values at all but first visit, baseline LT-4 dose, time on LT-4, time on LT-4 dose, and baseline value of outcome. P values for binary outcomes were adjusted for baseline TSH (low-normal vs high-normal) and baseline value of outcome. SF-36 and POMS outcomes were grouped, and multiple testing P value adjustments were made for these outcomes. a For BP, PF, RP, SF, and RE, the distributions within each group were highly skewed. We used the highest observed values of the skewed SF-36 subscales as the cutoff points for producing a dichotomous measure. The highest observed values were BP 75, PF 100, RP 50, SF 80, and RE 50. b Arms 1 and 2 were significantly different at the 5% level based on follow-up pairwise comparisons of all three arms based on Tukey multiple comparisons. However, these P values were not significant when we grouped outcomes from the same test together and applied multiple testing adjustments by using a Bonferroni correction to all individual Tukey adjusted P values comparing the three arms. c POMS values (except for the vigor subscale) were natural log-transformed before analysis because the raw data were skewed. All values were increased by 1 before the transformation because of the presence of zeros. d These scores were compared as the proportion positive between the groups, because the measures were skewed and contained a large proportion of zeros. View Large Table 2. End-of-Study Health Status and Mood Measures for Each Arm, Analyzed by Intention to Treat Measure Arm 1 Low-Normal TSH Arm 2 High-Normal TSH Arm 3 Mildly Elevated TSH P Billewicz  Billewicz Score 2.8 ± 0.4 4.0 ± 0.5 4.1 ± 0.4 0.38 Thyroid Disease Questionnaire   Thyroid Disease Questionnaire weighted average −1.6 ± 0.2 −1.4 ± 0.2 −1.9 ± 0.3 0.53 SF-36  Mental component summary 41.0 ± 0.9 39.8 ± 1.1 39.4 ± 1.0 0.66  Physical component summary 48.5 ± 0.7 50.6 ± 0.7 49.2 ± 0.8 0.28  General health 62.2 ± 1.6 64.3 ± 1.7 62.6 ± 2.1 0.86  Mental health 68.6 ± 1.6 66.0 ± 1.7 65.1 ± 1.7 0.12  Vitality 45.6 ± 2.7 50.5 ± 2.5 44.6 ± 2.7 0.36  Bodily pain (BP)a 21% high 23% high 33% high 0.69  Physical functioning (PF)a 26% high 49% high 31% high 0.03b  Role physical (RP)a 62% high 79% high 55% high 0.08  Social functioning (SF)a 62% high 53% high 52% high 0.62  Role emotional (RE)a 74% high 74% high 62% high 0.43 POMSc  Anger 3.6 ± 0.6 4.9 ± 0.7 4.3 ± 0.6 0.03b  Confusion 6.1 ± 0.5 6.6 ± 0.5 6.0 ± 0.4 0.87  Depression 4.4 ± 0.9 5.2 ± 0.9 5.4 ± 0.8 0.83  Fatigue 7.4 ± 0.8 6.7 ± 0.9 6.9 ± 0.8 0.42  Tension 5.2 ± 0.5 6.7 ± 0.5 7.2 ± 0.7 0.22  Vigor 14.9 ± 0.9 16.8 ± 0.9 14.8 ± 0.9 0.83 Affective Lability Score  Bipolar 0.6 ± 0.1 0.7 ± 0.1 0.7 ± 0.1 0.42  Depression 0.9 ± 0.1 0.9 ± 0.1 1.0 ± 0.1 0.69  Elation 0.7 ± 0.1 0.8 ± 0.1 0.8 ± 0.1 0.89  Angerd 61% score >0 57% score >0 62% score >0 0.67  Anxietyd 67% score >0 83% score >0 84% score >0 0.21  Anxiety Depressiond 63% score >0 66% score >0 76% score >0 0.58 Measure Arm 1 Low-Normal TSH Arm 2 High-Normal TSH Arm 3 Mildly Elevated TSH P Billewicz  Billewicz Score 2.8 ± 0.4 4.0 ± 0.5 4.1 ± 0.4 0.38 Thyroid Disease Questionnaire   Thyroid Disease Questionnaire weighted average −1.6 ± 0.2 −1.4 ± 0.2 −1.9 ± 0.3 0.53 SF-36  Mental component summary 41.0 ± 0.9 39.8 ± 1.1 39.4 ± 1.0 0.66  Physical component summary 48.5 ± 0.7 50.6 ± 0.7 49.2 ± 0.8 0.28  General health 62.2 ± 1.6 64.3 ± 1.7 62.6 ± 2.1 0.86  Mental health 68.6 ± 1.6 66.0 ± 1.7 65.1 ± 1.7 0.12  Vitality 45.6 ± 2.7 50.5 ± 2.5 44.6 ± 2.7 0.36  Bodily pain (BP)a 21% high 23% high 33% high 0.69  Physical functioning (PF)a 26% high 49% high 31% high 0.03b  Role physical (RP)a 62% high 79% high 55% high 0.08  Social functioning (SF)a 62% high 53% high 52% high 0.62  Role emotional (RE)a 74% high 74% high 62% high 0.43 POMSc  Anger 3.6 ± 0.6 4.9 ± 0.7 4.3 ± 0.6 0.03b  Confusion 6.1 ± 0.5 6.6 ± 0.5 6.0 ± 0.4 0.87  Depression 4.4 ± 0.9 5.2 ± 0.9 5.4 ± 0.8 0.83  Fatigue 7.4 ± 0.8 6.7 ± 0.9 6.9 ± 0.8 0.42  Tension 5.2 ± 0.5 6.7 ± 0.5 7.2 ± 0.7 0.22  Vigor 14.9 ± 0.9 16.8 ± 0.9 14.8 ± 0.9 0.83 Affective Lability Score  Bipolar 0.6 ± 0.1 0.7 ± 0.1 0.7 ± 0.1 0.42  Depression 0.9 ± 0.1 0.9 ± 0.1 1.0 ± 0.1 0.69  Elation 0.7 ± 0.1 0.8 ± 0.1 0.8 ± 0.1 0.89  Angerd 61% score >0 57% score >0 62% score >0 0.67  Anxietyd 67% score >0 83% score >0 84% score >0 0.21  Anxiety Depressiond 63% score >0 66% score >0 76% score >0 0.58 Values are mean ± standard error of the mean. Significant differences between arms are shown in bold. P values for continuous outcomes were adjusted for age, years of education, WAIS-R score, BMI at baseline, change in BMI, sex and estrogen status, baseline TSH (low-normal vs high-normal), standard deviation of TSH values at all but first visit, baseline LT-4 dose, time on LT-4, time on LT-4 dose, and baseline value of outcome. P values for binary outcomes were adjusted for baseline TSH (low-normal vs high-normal) and baseline value of outcome. SF-36 and POMS outcomes were grouped, and multiple testing P value adjustments were made for these outcomes. a For BP, PF, RP, SF, and RE, the distributions within each group were highly skewed. We used the highest observed values of the skewed SF-36 subscales as the cutoff points for producing a dichotomous measure. The highest observed values were BP 75, PF 100, RP 50, SF 80, and RE 50. b Arms 1 and 2 were significantly different at the 5% level based on follow-up pairwise comparisons of all three arms based on Tukey multiple comparisons. However, these P values were not significant when we grouped outcomes from the same test together and applied multiple testing adjustments by using a Bonferroni correction to all individual Tukey adjusted P values comparing the three arms. c POMS values (except for the vigor subscale) were natural log-transformed before analysis because the raw data were skewed. All values were increased by 1 before the transformation because of the presence of zeros. d These scores were compared as the proportion positive between the groups, because the measures were skewed and contained a large proportion of zeros. View Large Table 3. Correlations Between Changes in Thyroid Hormone Levels and Health Status and Mood Measures at End of Study fT4 fT3 TSH Measure Coefficient P Coefficient P Coefficient P Billewicz  Billewicz Score −0.13 (−1.2 to 0.95) 0.82 0.07 (−0.04 to 0.18) 0.20 0.04 (−0.07 to 0.14) 0.49 Thyroid Disease Questionnaire  Thyroid Disease Questionnaire weighted average 0.18 (−0.26 to 0.62) 0.42 0.02 (−0.03 to 0.06) 0.41 −0.04 (−0.08 to 0.00) 0.07 SF-36  Mental component summary 0.22 (−1.96 to 2.41) 0.84 0.11 (−0.13 to 0.34) 0.37 −0.05 (−0.27 to 0.17) 0.66  Physical component summary 0.40 (−1.27 to 2.06) 0.64 0.03 (−0.15 to 0.20) 0.75 −0.01 (−0.18 to 0.15) 0.89  General health 0.24 (−3.35 to 3.84) 0.89 0.09 (−0.29 to 0.47) 0.65 0.04 (−0.32 to 0.41) 0.82  Mental health 1.97 (−1.36 to 5.30) 0.24 0.19 (−0.17 to 0.54) 0.30 −0.33 (−0.66 to 0.00) 0.05  Vitality 1.67 (−3.30 to 6.63) 0.51 0.20 (−0.34 to 0.74) 0.46 −0.14 (−0.65 to 0.36) 0.57  Bodily pain (BP)a −2% (−64% to 161%) 0.96 −8% (−20% to 3%) 0.17 3% (−6% to 13%) 0.53  Physical functioning (PF)a 72% (−35% to 378%) 0.28 3% (−7% to 16%) 0.54 −4% (−13% to 5%) 0.40  Role physical (RP)a 35% (−43% to 239%) 0.50 3% (−7% to 14%) 0.57 −2% (−10% to 6%) 0.61  Social functioning (SF)a 7% (−59% to 183%) 0.90 7% (−4% to 19%) 0.24 0% (−8% to 9%) 0.96  Role emotional (RE)a 93% (−26% to 451%) 0.19 7% (−3% to 19%) 0.20 −5% (−12% to 4%) 0.27 POMSb  Anger −0.25 (−0.55 to 0.06) 0.11 −0.0006 (−0.0319 to 0.0306) 0.97 0.02 (−0.01 to 0.05) 0.21  Confusion −0.04 (−0.20 to 0.13) 0.64 0.001 (−0.016 to 0.018) 0.89 0.004 (−0.012 to 0.021) 0.58  Depression −0.17 (−0.50 to 0.17) 0.32 −0.01 (−0.04 to 0.03) 0.72 0.01 (−0.02 to 0.04) 0.51  Fatigue −0.002 (−0.286 to 0.282) 0.99 −0.003 (−0.032 to 0.026) 0.84 0.002 (−0.025 to 0.03) 0.86  Tension −0.14 (−0.31 to 0.04) 0.13 −0.01 (−0.03 to 0.01) 0.26 0.005 (−0.012 to 0.022) 0.58  Vigor −0.54 (−2.54 to 1.45) 0.59 −0.09 (−0.29 to 0.11) 0.37 0.03 (−0.16 to 0.23) 0.73 Affective Lability Score  Bipolar 0.02 (−0.15 to 0.18) 0.85 −0.002 (−0.018 to 0.014) 0.80 −0.002 (−0.018 to 0.014) 0.79  Depression −0.05 (−0.22 to 0.12) 0.60 −0.01 (−0.02 to 0.01) 0.48 0.003 (−0.013 to 0.020) 0.71  Elation −0.02 (−0.18 to 0.14) 0.80 −0.005 (−0.021 to 0.011) 0.56 0.0004 (−0.0153 to 0.016) 0.96  Angerc 29% (−48% to 233%) 0.59 −1% (−10% to 9%) 0.83 4% (−4% to 13%) 0.41  Anxietyc 2% (−67% to 231%) 0.97 −4% (−15% to 8%) 0.49 5% (−5% to 16%) 0.37  Anxiety Depressionc −33% (−80% to 111%) 0.51 2% (−10% to 15%) 0.81 0% (−9% to 11%) >0.99 fT4 fT3 TSH Measure Coefficient P Coefficient P Coefficient P Billewicz  Billewicz Score −0.13 (−1.2 to 0.95) 0.82 0.07 (−0.04 to 0.18) 0.20 0.04 (−0.07 to 0.14) 0.49 Thyroid Disease Questionnaire  Thyroid Disease Questionnaire weighted average 0.18 (−0.26 to 0.62) 0.42 0.02 (−0.03 to 0.06) 0.41 −0.04 (−0.08 to 0.00) 0.07 SF-36  Mental component summary 0.22 (−1.96 to 2.41) 0.84 0.11 (−0.13 to 0.34) 0.37 −0.05 (−0.27 to 0.17) 0.66  Physical component summary 0.40 (−1.27 to 2.06) 0.64 0.03 (−0.15 to 0.20) 0.75 −0.01 (−0.18 to 0.15) 0.89  General health 0.24 (−3.35 to 3.84) 0.89 0.09 (−0.29 to 0.47) 0.65 0.04 (−0.32 to 0.41) 0.82  Mental health 1.97 (−1.36 to 5.30) 0.24 0.19 (−0.17 to 0.54) 0.30 −0.33 (−0.66 to 0.00) 0.05  Vitality 1.67 (−3.30 to 6.63) 0.51 0.20 (−0.34 to 0.74) 0.46 −0.14 (−0.65 to 0.36) 0.57  Bodily pain (BP)a −2% (−64% to 161%) 0.96 −8% (−20% to 3%) 0.17 3% (−6% to 13%) 0.53  Physical functioning (PF)a 72% (−35% to 378%) 0.28 3% (−7% to 16%) 0.54 −4% (−13% to 5%) 0.40  Role physical (RP)a 35% (−43% to 239%) 0.50 3% (−7% to 14%) 0.57 −2% (−10% to 6%) 0.61  Social functioning (SF)a 7% (−59% to 183%) 0.90 7% (−4% to 19%) 0.24 0% (−8% to 9%) 0.96  Role emotional (RE)a 93% (−26% to 451%) 0.19 7% (−3% to 19%) 0.20 −5% (−12% to 4%) 0.27 POMSb  Anger −0.25 (−0.55 to 0.06) 0.11 −0.0006 (−0.0319 to 0.0306) 0.97 0.02 (−0.01 to 0.05) 0.21  Confusion −0.04 (−0.20 to 0.13) 0.64 0.001 (−0.016 to 0.018) 0.89 0.004 (−0.012 to 0.021) 0.58  Depression −0.17 (−0.50 to 0.17) 0.32 −0.01 (−0.04 to 0.03) 0.72 0.01 (−0.02 to 0.04) 0.51  Fatigue −0.002 (−0.286 to 0.282) 0.99 −0.003 (−0.032 to 0.026) 0.84 0.002 (−0.025 to 0.03) 0.86  Tension −0.14 (−0.31 to 0.04) 0.13 −0.01 (−0.03 to 0.01) 0.26 0.005 (−0.012 to 0.022) 0.58  Vigor −0.54 (−2.54 to 1.45) 0.59 −0.09 (−0.29 to 0.11) 0.37 0.03 (−0.16 to 0.23) 0.73 Affective Lability Score  Bipolar 0.02 (−0.15 to 0.18) 0.85 −0.002 (−0.018 to 0.014) 0.80 −0.002 (−0.018 to 0.014) 0.79  Depression −0.05 (−0.22 to 0.12) 0.60 −0.01 (−0.02 to 0.01) 0.48 0.003 (−0.013 to 0.020) 0.71  Elation −0.02 (−0.18 to 0.14) 0.80 −0.005 (−0.021 to 0.011) 0.56 0.0004 (−0.0153 to 0.016) 0.96  Angerc 29% (−48% to 233%) 0.59 −1% (−10% to 9%) 0.83 4% (−4% to 13%) 0.41  Anxietyc 2% (−67% to 231%) 0.97 −4% (−15% to 8%) 0.49 5% (−5% to 16%) 0.37  Anxiety Depressionc −33% (−80% to 111%) 0.51 2% (−10% to 15%) 0.81 0% (−9% to 11%) >0.99 Correlations [95% confidence intervals (CIs)] with continuous outcomes were modeled with multiple linear regressions adjusting for age, years of education, WAIS-R score, BMI at baseline, change in BMI, sex and estrogen status, standard deviation of TSH values at all but first visit, baseline LT-4 dose, time on LT-4, time on LT-4 dose, baseline hormone value, and baseline value of outcome. The magnitude of the coefficient indicates the estimated change in the outcome for each 1-unit increase in the study change (end of study – baseline) of fT4 or TSH and a 10-unit increase in the study change of fT3. Correlations (95% CIs) with binary outcomes were modeled with multiple logistic regressions adjusting for the baseline value of the hormone and the outcome of interest. Coefficients were transformed to estimate the percentage change in the predicted odds of the measure for a 1-unit increase in the study change (end of study – baseline) of fT4 or TSH and a 10-unit increase in the study change of fT3. The transformed coefficients are estimates of the risk ratios associated with the 1- or 10-unit increase in the study change of respective hormone level. A positive coefficient indicates that the measure increased with increasing hormone levels, whereas a negative coefficient indicates that the measure decreased with increasing hormone levels. Separate models were run for each hormone. Significant coefficients are shown in bold with corresponding P values. a For these variables, the distributions within each group were highly skewed. We used the highest observed values of the skewed SF-36 subscales as the cutoff points for producing a dichotomous measure. The highest observed values were BP 75, PF 100, RP 50, SF-80, and RE 50. b POMS values (except for the vigor subscale) were natural log-transformed before analysis because the raw data were skewed. All values were increased by 1 before the transformation because of the presence of zeros. The magnitude of the coefficient indicates the estimated change in the natural log of the measure plus 1 with a 1-unit (10 units for fT3) increase. c These scores were compared as the proportion positive between the groups, because the measures were skewed and contained a large proportion of zeros. View Large Table 3. Correlations Between Changes in Thyroid Hormone Levels and Health Status and Mood Measures at End of Study fT4 fT3 TSH Measure Coefficient P Coefficient P Coefficient P Billewicz  Billewicz Score −0.13 (−1.2 to 0.95) 0.82 0.07 (−0.04 to 0.18) 0.20 0.04 (−0.07 to 0.14) 0.49 Thyroid Disease Questionnaire  Thyroid Disease Questionnaire weighted average 0.18 (−0.26 to 0.62) 0.42 0.02 (−0.03 to 0.06) 0.41 −0.04 (−0.08 to 0.00) 0.07 SF-36  Mental component summary 0.22 (−1.96 to 2.41) 0.84 0.11 (−0.13 to 0.34) 0.37 −0.05 (−0.27 to 0.17) 0.66  Physical component summary 0.40 (−1.27 to 2.06) 0.64 0.03 (−0.15 to 0.20) 0.75 −0.01 (−0.18 to 0.15) 0.89  General health 0.24 (−3.35 to 3.84) 0.89 0.09 (−0.29 to 0.47) 0.65 0.04 (−0.32 to 0.41) 0.82  Mental health 1.97 (−1.36 to 5.30) 0.24 0.19 (−0.17 to 0.54) 0.30 −0.33 (−0.66 to 0.00) 0.05  Vitality 1.67 (−3.30 to 6.63) 0.51 0.20 (−0.34 to 0.74) 0.46 −0.14 (−0.65 to 0.36) 0.57  Bodily pain (BP)a −2% (−64% to 161%) 0.96 −8% (−20% to 3%) 0.17 3% (−6% to 13%) 0.53  Physical functioning (PF)a 72% (−35% to 378%) 0.28 3% (−7% to 16%) 0.54 −4% (−13% to 5%) 0.40  Role physical (RP)a 35% (−43% to 239%) 0.50 3% (−7% to 14%) 0.57 −2% (−10% to 6%) 0.61  Social functioning (SF)a 7% (−59% to 183%) 0.90 7% (−4% to 19%) 0.24 0% (−8% to 9%) 0.96  Role emotional (RE)a 93% (−26% to 451%) 0.19 7% (−3% to 19%) 0.20 −5% (−12% to 4%) 0.27 POMSb  Anger −0.25 (−0.55 to 0.06) 0.11 −0.0006 (−0.0319 to 0.0306) 0.97 0.02 (−0.01 to 0.05) 0.21  Confusion −0.04 (−0.20 to 0.13) 0.64 0.001 (−0.016 to 0.018) 0.89 0.004 (−0.012 to 0.021) 0.58  Depression −0.17 (−0.50 to 0.17) 0.32 −0.01 (−0.04 to 0.03) 0.72 0.01 (−0.02 to 0.04) 0.51  Fatigue −0.002 (−0.286 to 0.282) 0.99 −0.003 (−0.032 to 0.026) 0.84 0.002 (−0.025 to 0.03) 0.86  Tension −0.14 (−0.31 to 0.04) 0.13 −0.01 (−0.03 to 0.01) 0.26 0.005 (−0.012 to 0.022) 0.58  Vigor −0.54 (−2.54 to 1.45) 0.59 −0.09 (−0.29 to 0.11) 0.37 0.03 (−0.16 to 0.23) 0.73 Affective Lability Score  Bipolar 0.02 (−0.15 to 0.18) 0.85 −0.002 (−0.018 to 0.014) 0.80 −0.002 (−0.018 to 0.014) 0.79  Depression −0.05 (−0.22 to 0.12) 0.60 −0.01 (−0.02 to 0.01) 0.48 0.003 (−0.013 to 0.020) 0.71  Elation −0.02 (−0.18 to 0.14) 0.80 −0.005 (−0.021 to 0.011) 0.56 0.0004 (−0.0153 to 0.016) 0.96  Angerc 29% (−48% to 233%) 0.59 −1% (−10% to 9%) 0.83 4% (−4% to 13%) 0.41  Anxietyc 2% (−67% to 231%) 0.97 −4% (−15% to 8%) 0.49 5% (−5% to 16%) 0.37  Anxiety Depressionc −33% (−80% to 111%) 0.51 2% (−10% to 15%) 0.81 0% (−9% to 11%) >0.99 fT4 fT3 TSH Measure Coefficient P Coefficient P Coefficient P Billewicz  Billewicz Score −0.13 (−1.2 to 0.95) 0.82 0.07 (−0.04 to 0.18) 0.20 0.04 (−0.07 to 0.14) 0.49 Thyroid Disease Questionnaire  Thyroid Disease Questionnaire weighted average 0.18 (−0.26 to 0.62) 0.42 0.02 (−0.03 to 0.06) 0.41 −0.04 (−0.08 to 0.00) 0.07 SF-36  Mental component summary 0.22 (−1.96 to 2.41) 0.84 0.11 (−0.13 to 0.34) 0.37 −0.05 (−0.27 to 0.17) 0.66  Physical component summary 0.40 (−1.27 to 2.06) 0.64 0.03 (−0.15 to 0.20) 0.75 −0.01 (−0.18 to 0.15) 0.89  General health 0.24 (−3.35 to 3.84) 0.89 0.09 (−0.29 to 0.47) 0.65 0.04 (−0.32 to 0.41) 0.82  Mental health 1.97 (−1.36 to 5.30) 0.24 0.19 (−0.17 to 0.54) 0.30 −0.33 (−0.66 to 0.00) 0.05  Vitality 1.67 (−3.30 to 6.63) 0.51 0.20 (−0.34 to 0.74) 0.46 −0.14 (−0.65 to 0.36) 0.57  Bodily pain (BP)a −2% (−64% to 161%) 0.96 −8% (−20% to 3%) 0.17 3% (−6% to 13%) 0.53  Physical functioning (PF)a 72% (−35% to 378%) 0.28 3% (−7% to 16%) 0.54 −4% (−13% to 5%) 0.40  Role physical (RP)a 35% (−43% to 239%) 0.50 3% (−7% to 14%) 0.57 −2% (−10% to 6%) 0.61  Social functioning (SF)a 7% (−59% to 183%) 0.90 7% (−4% to 19%) 0.24 0% (−8% to 9%) 0.96  Role emotional (RE)a 93% (−26% to 451%) 0.19 7% (−3% to 19%) 0.20 −5% (−12% to 4%) 0.27 POMSb  Anger −0.25 (−0.55 to 0.06) 0.11 −0.0006 (−0.0319 to 0.0306) 0.97 0.02 (−0.01 to 0.05) 0.21  Confusion −0.04 (−0.20 to 0.13) 0.64 0.001 (−0.016 to 0.018) 0.89 0.004 (−0.012 to 0.021) 0.58  Depression −0.17 (−0.50 to 0.17) 0.32 −0.01 (−0.04 to 0.03) 0.72 0.01 (−0.02 to 0.04) 0.51  Fatigue −0.002 (−0.286 to 0.282) 0.99 −0.003 (−0.032 to 0.026) 0.84 0.002 (−0.025 to 0.03) 0.86  Tension −0.14 (−0.31 to 0.04) 0.13 −0.01 (−0.03 to 0.01) 0.26 0.005 (−0.012 to 0.022) 0.58  Vigor −0.54 (−2.54 to 1.45) 0.59 −0.09 (−0.29 to 0.11) 0.37 0.03 (−0.16 to 0.23) 0.73 Affective Lability Score  Bipolar 0.02 (−0.15 to 0.18) 0.85 −0.002 (−0.018 to 0.014) 0.80 −0.002 (−0.018 to 0.014) 0.79  Depression −0.05 (−0.22 to 0.12) 0.60 −0.01 (−0.02 to 0.01) 0.48 0.003 (−0.013 to 0.020) 0.71  Elation −0.02 (−0.18 to 0.14) 0.80 −0.005 (−0.021 to 0.011) 0.56 0.0004 (−0.0153 to 0.016) 0.96  Angerc 29% (−48% to 233%) 0.59 −1% (−10% to 9%) 0.83 4% (−4% to 13%) 0.41  Anxietyc 2% (−67% to 231%) 0.97 −4% (−15% to 8%) 0.49 5% (−5% to 16%) 0.37  Anxiety Depressionc −33% (−80% to 111%) 0.51 2% (−10% to 15%) 0.81 0% (−9% to 11%) >0.99 Correlations [95% confidence intervals (CIs)] with continuous outcomes were modeled with multiple linear regressions adjusting for age, years of education, WAIS-R score, BMI at baseline, change in BMI, sex and estrogen status, standard deviation of TSH values at all but first visit, baseline LT-4 dose, time on LT-4, time on LT-4 dose, baseline hormone value, and baseline value of outcome. The magnitude of the coefficient indicates the estimated change in the outcome for each 1-unit increase in the study change (end of study – baseline) of fT4 or TSH and a 10-unit increase in the study change of fT3. Correlations (95% CIs) with binary outcomes were modeled with multiple logistic regressions adjusting for the baseline value of the hormone and the outcome of interest. Coefficients were transformed to estimate the percentage change in the predicted odds of the measure for a 1-unit increase in the study change (end of study – baseline) of fT4 or TSH and a 10-unit increase in the study change of fT3. The transformed coefficients are estimates of the risk ratios associated with the 1- or 10-unit increase in the study change of respective hormone level. A positive coefficient indicates that the measure increased with increasing hormone levels, whereas a negative coefficient indicates that the measure decreased with increasing hormone levels. Separate models were run for each hormone. Significant coefficients are shown in bold with corresponding P values. a For these variables, the distributions within each group were highly skewed. We used the highest observed values of the skewed SF-36 subscales as the cutoff points for producing a dichotomous measure. The highest observed values were BP 75, PF 100, RP 50, SF-80, and RE 50. b POMS values (except for the vigor subscale) were natural log-transformed before analysis because the raw data were skewed. All values were increased by 1 before the transformation because of the presence of zeros. The magnitude of the coefficient indicates the estimated change in the natural log of the measure plus 1 with a 1-unit (10 units for fT3) increase. c These scores were compared as the proportion positive between the groups, because the measures were skewed and contained a large proportion of zeros. View Large Cognitive tests by intention to treat At the end of the study, the Letter Cancellation Test percentage with no errors and 1-back number correct on target were worse in the mildly elevated TSH compared with the low-normal TSH arm (11% vs 30%, P = 0.02; 68% vs 84%, P = 0.02), but these differences were not significant after correction for multiple testing. There were no other differences between the three arms in cognitive outcomes (Table 4). With TSH, fT4, and fT3 analyzed as continuous variables, there were a few correlations between fT4 or fT3 and individual outcomes, but only one remained significant after correction for multiple comparisons (Pursuit Rotor Trial 3 time on target inversely related to fT4 levels) (Table 5). Table 4. End-of-Study Cognitive Measures for Each Arm Analyzed by Intention to Treat Test Arm 1 Low-Normal TSH Arm 2 High-Normal TSH Arm 3 Mildly Elevated TSH P Executive function Letter Cancellation Test  Time, s 102.0 ± 3.3 102.6 ± 3.1 100.4 ± 3.0 0.64  % With no errors 30% 26% 11% 0.02a Trail Making Test  Time, (s 23.3 ± 1.0 23.2 ± 1.0 21.9 ± 0.8 0.26  ABC time, s 56.7 ± 2.8 59.9 ± 3.1 54.9 ± 2.9 0.70  % With errors 11% 9% 11% 0.91  % With ABC errors 28% 26% 22% 0.76 Iowa Gambling Test  Net-1 0.7 ± 1.6 −0.7 ± 1.1 −2.4 ± 1.1 0.48  Net-2 5.4 ± 1.3 4.9 ± 1.3 4.0 ± 1.3 0.73  Net-3 7.2 ± 1.6 8.2 ± 1.3 6.8 ± 1.5 0.60  Net-4 9.5 ± 1.6 7.5 ± 1.4 9.1 ± 1.5 0.58  Net-5 7.9 ± 1.6 7.5 ± 1.5 8.8 ± 1.6 0.84 Working memory N-Back number correct on target  1-Backb 84% 75% 68% 0.02a  2-Backb 65% 65% 70% 0.81  3-Back 11.7 ± 0.3 11.5 ± 0.4 11.5 ± 0.4 0.75 N-Back number incorrect nontarget  1-Backb 14% 18% 20% 0.43  2-Backb 23% 25% 34% 0.31  3-Back 3.0 ± 0.2 3.5 ± 0.3 3.1 ± 0.3 0.38 Subject-Ordered Pointing errors  6 0.4 ± 0.1 0.5 ± 0.1 0.4 ± 0.1 0.68  8 0.9 ± 0.1 0.8 ± 0.1 0.9 ± 0.1 0.73  10 1.1 ± 0.1 1.1 ± 0.1 1.1 ± 0.1 0.60  12 1.5 ± 0.2 1.5 ± 0.2 1.5 ± 0.2 0.80 Declarative memory Paragraph Recall  Immediate 13.3 ± 0.4 13.5 ± 0.5 14.0 ± 0.4 0.70  30-Min delay 11.9 ± 0.4 12.1 ± 0.5 12.8 ± 0.4 0.68 Motor learning Pursuit Rotor Trial  Time on target, s   1 38.7 ± 2.1 34.8 ± 2.0 41.5 ± 2.2 0.51   2 39.6 ± 2.3 37.0 ± 2.0 43.8 ± 2.1 0.58   3 39.9 ± 2.3 37.4 ± 1.9 44.0 ± 2.2 0.44   4 41.8 ± 2.3 39.4 ± 1.9 45.0 ± 2.2 0.68 Motor Sequence Learning Test  Total movement time, s 1114.4 ± 43.7 1125.3 ± 36.8 1019.6 ± 35.9 0.46 Test Arm 1 Low-Normal TSH Arm 2 High-Normal TSH Arm 3 Mildly Elevated TSH P Executive function Letter Cancellation Test  Time, s 102.0 ± 3.3 102.6 ± 3.1 100.4 ± 3.0 0.64  % With no errors 30% 26% 11% 0.02a Trail Making Test  Time, (s 23.3 ± 1.0 23.2 ± 1.0 21.9 ± 0.8 0.26  ABC time, s 56.7 ± 2.8 59.9 ± 3.1 54.9 ± 2.9 0.70  % With errors 11% 9% 11% 0.91  % With ABC errors 28% 26% 22% 0.76 Iowa Gambling Test  Net-1 0.7 ± 1.6 −0.7 ± 1.1 −2.4 ± 1.1 0.48  Net-2 5.4 ± 1.3 4.9 ± 1.3 4.0 ± 1.3 0.73  Net-3 7.2 ± 1.6 8.2 ± 1.3 6.8 ± 1.5 0.60  Net-4 9.5 ± 1.6 7.5 ± 1.4 9.1 ± 1.5 0.58  Net-5 7.9 ± 1.6 7.5 ± 1.5 8.8 ± 1.6 0.84 Working memory N-Back number correct on target  1-Backb 84% 75% 68% 0.02a  2-Backb 65% 65% 70% 0.81  3-Back 11.7 ± 0.3 11.5 ± 0.4 11.5 ± 0.4 0.75 N-Back number incorrect nontarget  1-Backb 14% 18% 20% 0.43  2-Backb 23% 25% 34% 0.31  3-Back 3.0 ± 0.2 3.5 ± 0.3 3.1 ± 0.3 0.38 Subject-Ordered Pointing errors  6 0.4 ± 0.1 0.5 ± 0.1 0.4 ± 0.1 0.68  8 0.9 ± 0.1 0.8 ± 0.1 0.9 ± 0.1 0.73  10 1.1 ± 0.1 1.1 ± 0.1 1.1 ± 0.1 0.60  12 1.5 ± 0.2 1.5 ± 0.2 1.5 ± 0.2 0.80 Declarative memory Paragraph Recall  Immediate 13.3 ± 0.4 13.5 ± 0.5 14.0 ± 0.4 0.70  30-Min delay 11.9 ± 0.4 12.1 ± 0.5 12.8 ± 0.4 0.68 Motor learning Pursuit Rotor Trial  Time on target, s   1 38.7 ± 2.1 34.8 ± 2.0 41.5 ± 2.2 0.51   2 39.6 ± 2.3 37.0 ± 2.0 43.8 ± 2.1 0.58   3 39.9 ± 2.3 37.4 ± 1.9 44.0 ± 2.2 0.44   4 41.8 ± 2.3 39.4 ± 1.9 45.0 ± 2.2 0.68 Motor Sequence Learning Test  Total movement time, s 1114.4 ± 43.7 1125.3 ± 36.8 1019.6 ± 35.9 0.46 Values are mean ± standard error of the mean. Significant differences between arms are shown in bold. Individual tests are grouped by cognitive subdomains (first column). P values for continuous outcomes were adjusted for age, years of education, WAIS-R score, BMI at baseline, change in BMI, sex and estrogen status, baseline TSH (low-normal vs high-normal), standard deviation of TSH values at interim and last visits, baseline LT-4 dose, time on LT-4, time on LT-4 dose, and baseline value of outcome. P values for binary outcomes were adjusted for baseline TSH (low-normal vs high-normal) and baseline value of outcome. Letter Cancellation and N-Back outcomes were grouped, and multiple testing P value adjustments were made for these outcomes. a Arms 1 and 3 were significantly different at the 5% level based on follow-up pairwise comparisons of all three arms based on Tukey multiple comparisons. However, these P values were not significant when we grouped outcomes from the same test together and applied multiple testing adjustments by using Bonferroni correction to all individual Tukey adjusted P values comparing the three arms. b These values were calculated as proportion of subjects for whom each measure was ≥15 (for correct on target) or >0 (for incorrect nontarget), because there were floor effects. View Large Table 4. End-of-Study Cognitive Measures for Each Arm Analyzed by Intention to Treat Test Arm 1 Low-Normal TSH Arm 2 High-Normal TSH Arm 3 Mildly Elevated TSH P Executive function Letter Cancellation Test  Time, s 102.0 ± 3.3 102.6 ± 3.1 100.4 ± 3.0 0.64  % With no errors 30% 26% 11% 0.02a Trail Making Test  Time, (s 23.3 ± 1.0 23.2 ± 1.0 21.9 ± 0.8 0.26  ABC time, s 56.7 ± 2.8 59.9 ± 3.1 54.9 ± 2.9 0.70  % With errors 11% 9% 11% 0.91  % With ABC errors 28% 26% 22% 0.76 Iowa Gambling Test  Net-1 0.7 ± 1.6 −0.7 ± 1.1 −2.4 ± 1.1 0.48  Net-2 5.4 ± 1.3 4.9 ± 1.3 4.0 ± 1.3 0.73  Net-3 7.2 ± 1.6 8.2 ± 1.3 6.8 ± 1.5 0.60  Net-4 9.5 ± 1.6 7.5 ± 1.4 9.1 ± 1.5 0.58  Net-5 7.9 ± 1.6 7.5 ± 1.5 8.8 ± 1.6 0.84 Working memory N-Back number correct on target  1-Backb 84% 75% 68% 0.02a  2-Backb 65% 65% 70% 0.81  3-Back 11.7 ± 0.3 11.5 ± 0.4 11.5 ± 0.4 0.75 N-Back number incorrect nontarget  1-Backb 14% 18% 20% 0.43  2-Backb 23% 25% 34% 0.31  3-Back 3.0 ± 0.2 3.5 ± 0.3 3.1 ± 0.3 0.38 Subject-Ordered Pointing errors  6 0.4 ± 0.1 0.5 ± 0.1 0.4 ± 0.1 0.68  8 0.9 ± 0.1 0.8 ± 0.1 0.9 ± 0.1 0.73  10 1.1 ± 0.1 1.1 ± 0.1 1.1 ± 0.1 0.60  12 1.5 ± 0.2 1.5 ± 0.2 1.5 ± 0.2 0.80 Declarative memory Paragraph Recall  Immediate 13.3 ± 0.4 13.5 ± 0.5 14.0 ± 0.4 0.70  30-Min delay 11.9 ± 0.4 12.1 ± 0.5 12.8 ± 0.4 0.68 Motor learning Pursuit Rotor Trial  Time on target, s   1 38.7 ± 2.1 34.8 ± 2.0 41.5 ± 2.2 0.51   2 39.6 ± 2.3 37.0 ± 2.0 43.8 ± 2.1 0.58   3 39.9 ± 2.3 37.4 ± 1.9 44.0 ± 2.2 0.44   4 41.8 ± 2.3 39.4 ± 1.9 45.0 ± 2.2 0.68 Motor Sequence Learning Test  Total movement time, s 1114.4 ± 43.7 1125.3 ± 36.8 1019.6 ± 35.9 0.46 Test Arm 1 Low-Normal TSH Arm 2 High-Normal TSH Arm 3 Mildly Elevated TSH P Executive function Letter Cancellation Test  Time, s 102.0 ± 3.3 102.6 ± 3.1 100.4 ± 3.0 0.64  % With no errors 30% 26% 11% 0.02a Trail Making Test  Time, (s 23.3 ± 1.0 23.2 ± 1.0 21.9 ± 0.8 0.26  ABC time, s 56.7 ± 2.8 59.9 ± 3.1 54.9 ± 2.9 0.70  % With errors 11% 9% 11% 0.91  % With ABC errors 28% 26% 22% 0.76 Iowa Gambling Test  Net-1 0.7 ± 1.6 −0.7 ± 1.1 −2.4 ± 1.1 0.48  Net-2 5.4 ± 1.3 4.9 ± 1.3 4.0 ± 1.3 0.73  Net-3 7.2 ± 1.6 8.2 ± 1.3 6.8 ± 1.5 0.60  Net-4 9.5 ± 1.6 7.5 ± 1.4 9.1 ± 1.5 0.58  Net-5 7.9 ± 1.6 7.5 ± 1.5 8.8 ± 1.6 0.84 Working memory N-Back number correct on target  1-Backb 84% 75% 68% 0.02a  2-Backb 65% 65% 70% 0.81  3-Back 11.7 ± 0.3 11.5 ± 0.4 11.5 ± 0.4 0.75 N-Back number incorrect nontarget  1-Backb 14% 18% 20% 0.43  2-Backb 23% 25% 34% 0.31  3-Back 3.0 ± 0.2 3.5 ± 0.3 3.1 ± 0.3 0.38 Subject-Ordered Pointing errors  6 0.4 ± 0.1 0.5 ± 0.1 0.4 ± 0.1 0.68  8 0.9 ± 0.1 0.8 ± 0.1 0.9 ± 0.1 0.73  10 1.1 ± 0.1 1.1 ± 0.1 1.1 ± 0.1 0.60  12 1.5 ± 0.2 1.5 ± 0.2 1.5 ± 0.2 0.80 Declarative memory Paragraph Recall  Immediate 13.3 ± 0.4 13.5 ± 0.5 14.0 ± 0.4 0.70  30-Min delay 11.9 ± 0.4 12.1 ± 0.5 12.8 ± 0.4 0.68 Motor learning Pursuit Rotor Trial  Time on target, s   1 38.7 ± 2.1 34.8 ± 2.0 41.5 ± 2.2 0.51   2 39.6 ± 2.3 37.0 ± 2.0 43.8 ± 2.1 0.58   3 39.9 ± 2.3 37.4 ± 1.9 44.0 ± 2.2 0.44   4 41.8 ± 2.3 39.4 ± 1.9 45.0 ± 2.2 0.68 Motor Sequence Learning Test  Total movement time, s 1114.4 ± 43.7 1125.3 ± 36.8 1019.6 ± 35.9 0.46 Values are mean ± standard error of the mean. Significant differences between arms are shown in bold. Individual tests are grouped by cognitive subdomains (first column). P values for continuous outcomes were adjusted for age, years of education, WAIS-R score, BMI at baseline, change in BMI, sex and estrogen status, baseline TSH (low-normal vs high-normal), standard deviation of TSH values at interim and last visits, baseline LT-4 dose, time on LT-4, time on LT-4 dose, and baseline value of outcome. P values for binary outcomes were adjusted for baseline TSH (low-normal vs high-normal) and baseline value of outcome. Letter Cancellation and N-Back outcomes were grouped, and multiple testing P value adjustments were made for these outcomes. a Arms 1 and 3 were significantly different at the 5% level based on follow-up pairwise comparisons of all three arms based on Tukey multiple comparisons. However, these P values were not significant when we grouped outcomes from the same test together and applied multiple testing adjustments by using Bonferroni correction to all individual Tukey adjusted P values comparing the three arms. b These values were calculated as proportion of subjects for whom each measure was ≥15 (for correct on target) or >0 (for incorrect nontarget), because there were floor effects. View Large Table 5. Correlations Between Changes in Thyroid Hormone Levels and Cognitive Measures at End of Study fT4 fT3 TSH Test Coefficient P Coefficient P Coefficient P Executive function Letter Cancellation Test  Time, s 1.85 (−5.37 to 9.07) 0.61 0.60 (−0.14 to 1.33) 0.11 −0.09 (−0.80 to 0.62) 0.80  % With no errors 13% (−57% to 182%) 0.80 −3% (−13% to 7%) 0.56 −9% (−20% to 2%) 0.13 Trail Making Test  Time, s 1.04 (−1.04 to 3.13) 0.32 −0.10 (−0.31 to 0.11) 0.34 −0.03 (−0.24 to 0.17) 0.75  ABC time, s −4.14 (−10.33 to 2.05) 0.19 0.12 (−0.52 to 0.75) 0.72 −0.04 (−0.65 to 0.57) 0.89  % With errors −23% (−82% to 177%) 0.70 −5% (−18% to 8%) 0.45 3% (−10% to 14%) 0.66  % With ABC errors −11% (−66% to 121%) 0.81 −8% (−18% to 1%) 0.10 −5% (−14% to 4%) 0.29 Iowa Gambling Test  Net-1 −0.24 (−3.91 to 3.43) 0.90 −0.40 (−0.77 to −0.03) 0.03a 0.09 (−0.27 to 0.46) 0.61  Net-2 −1.73 (−5.31 to 1.84) 0.34 −0.23 (−0.60 to 0.14) 0.21 −0.01 (−0.36 to 0.34) 0.97  Net-3 0.49 (−3.09 to 4.08) 0.79 0.09 (−0.28 to 0.45) 0.65 0.10 (−0.25 to 0.45) 0.58  Net-4 −1.88 (−5.72 to 1.97) 0.34 -0.51 (−0.89 to −0.12) 0.01a 0.001 (−0.374 to 0.377) >0.99  Net-5 −0.39 (−4.25 to 3.48) 0.84 −0.21 (−0.61 to 0.19) 0.30 0.07 (−0.31 to 0.44) 0.73 Working memory N-Back number correct  on target  1-Backb 252% (18% to 1118%) 0.03a 9% (−2% to 23%) 0.14 −7% (−14% to 2%) 0.11  2-Backb 1% (−60% to 159%) >0.99 7% (−3% to 20%) 0.20 −2% (−10% to 8%) 0.70  3-Back 0.17 (−0.76 to 1.10) 0.71 −0.04 (−0.14 to 0.05) 0.36 0.05 (−0.04 to 0.14) 0.31 N-Back number  incorrect nontarget  1-Backb −22% (−77% to 136%) 0.67 0% (−11% to 11%) 0.95 3% (−7% to 14%) 0.51  2-Backb 27% (−52% to 236%) 0.63 3% (−8% to 14%) 0.63 5% (−4% to 14%) 0.29  3-Back −0.28 (−1.07 to 0.51) 0.48 0.05 (−0.04 to 0.13) 0.27 −0.01 (−0.09 to 0.06) 0.74 Subject-Ordered  Pointing errors  6 0.02 (−0.14 to 0.18) 0.80 −0.01 (−0.02 to 0.01) 0.52 0.01 (−0.01 to 0.02) 0.27  8 0.11 (−0.16 to 0.38) 0.42 0.0003 (−0.0280 to 0.0286) 0.98 0.002 (−0.024 to 0.029) 0.88  10 0.10 (−0.14 to 0.35) 0.42 −0.003 (−0.029 to 0.023) 0.82 0.02 (−0.01 to 0.04) 0.18  12 0.05 (−0.28 to 0.39) 0.75 −0.02 (−0.05 to 0.02) 0.27 −0.003 (−0.036 to 0.029) 0.83 Declarative memory Paragraph Recall  Immediate −0.33 (−1.33 to 0.67) 0.52 −0.02 (−0.12 to 0.08) 0.71 0.04 (−0.05 to 0.14) 0.38  30-min delay −0.51 (−1.53 to 0.51) 0.32 0.01 (−0.10 to 0.11) 0.86 0.08 (−0.02 to 0.18) 0.11 Motor learning Pursuit Rotor Trial   Time on target, s  1 −5.13 (−9.77 to −0.48) 0.03a 0.10 (−0.40 to 0.59) 0.70 0.18 (−0.30 to 0.65) 0.47   2 −5.78 (−10.62 to −0.93) 0.02a 0.16 (−0.35 to 0.68) 0.53 0.22 (−0.29 to 0.72) 0.40   3 −6.70 (−11.25 to −2.14) 0.004c 0.09 (−0.40 to 0.58) 0.72 0.30 (−0.17 to 0.77) 0.21   4 −2.19 (−7.09 to 2.71) 0.38 0.10 (−0.41 to 0.61) 0.70 0.15 (−0.34 to 0.65) 0.54 Motor Sequence  Learning Test  Total movement  time, s 31.00 (−14.82 to 76.82) 0.18 2.31 (−2.45 to 7.07) 0.34 −1.08 (−5.74 to 3.58) 0.65 fT4 fT3 TSH Test Coefficient P Coefficient P Coefficient P Executive function Letter Cancellation Test  Time, s 1.85 (−5.37 to 9.07) 0.61 0.60 (−0.14 to 1.33) 0.11 −0.09 (−0.80 to 0.62) 0.80  % With no errors 13% (−57% to 182%) 0.80 −3% (−13% to 7%) 0.56 −9% (−20% to 2%) 0.13 Trail Making Test  Time, s 1.04 (−1.04 to 3.13) 0.32 −0.10 (−0.31 to 0.11) 0.34 −0.03 (−0.24 to 0.17) 0.75  ABC time, s −4.14 (−10.33 to 2.05) 0.19 0.12 (−0.52 to 0.75) 0.72 −0.04 (−0.65 to 0.57) 0.89  % With errors −23% (−82% to 177%) 0.70 −5% (−18% to 8%) 0.45 3% (−10% to 14%) 0.66  % With ABC errors −11% (−66% to 121%) 0.81 −8% (−18% to 1%) 0.10 −5% (−14% to 4%) 0.29 Iowa Gambling Test  Net-1 −0.24 (−3.91 to 3.43) 0.90 −0.40 (−0.77 to −0.03) 0.03a 0.09 (−0.27 to 0.46) 0.61  Net-2 −1.73 (−5.31 to 1.84) 0.34 −0.23 (−0.60 to 0.14) 0.21 −0.01 (−0.36 to 0.34) 0.97  Net-3 0.49 (−3.09 to 4.08) 0.79 0.09 (−0.28 to 0.45) 0.65 0.10 (−0.25 to 0.45) 0.58  Net-4 −1.88 (−5.72 to 1.97) 0.34 -0.51 (−0.89 to −0.12) 0.01a 0.001 (−0.374 to 0.377) >0.99  Net-5 −0.39 (−4.25 to 3.48) 0.84 −0.21 (−0.61 to 0.19) 0.30 0.07 (−0.31 to 0.44) 0.73 Working memory N-Back number correct  on target  1-Backb 252% (18% to 1118%) 0.03a 9% (−2% to 23%) 0.14 −7% (−14% to 2%) 0.11  2-Backb 1% (−60% to 159%) >0.99 7% (−3% to 20%) 0.20 −2% (−10% to 8%) 0.70  3-Back 0.17 (−0.76 to 1.10) 0.71 −0.04 (−0.14 to 0.05) 0.36 0.05 (−0.04 to 0.14) 0.31 N-Back number  incorrect nontarget  1-Backb −22% (−77% to 136%) 0.67 0% (−11% to 11%) 0.95 3% (−7% to 14%) 0.51  2-Backb 27% (−52% to 236%) 0.63 3% (−8% to 14%) 0.63 5% (−4% to 14%) 0.29  3-Back −0.28 (−1.07 to 0.51) 0.48 0.05 (−0.04 to 0.13) 0.27 −0.01 (−0.09 to 0.06) 0.74 Subject-Ordered  Pointing errors  6 0.02 (−0.14 to 0.18) 0.80 −0.01 (−0.02 to 0.01) 0.52 0.01 (−0.01 to 0.02) 0.27  8 0.11 (−0.16 to 0.38) 0.42 0.0003 (−0.0280 to 0.0286) 0.98 0.002 (−0.024 to 0.029) 0.88  10 0.10 (−0.14 to 0.35) 0.42 −0.003 (−0.029 to 0.023) 0.82 0.02 (−0.01 to 0.04) 0.18  12 0.05 (−0.28 to 0.39) 0.75 −0.02 (−0.05 to 0.02) 0.27 −0.003 (−0.036 to 0.029) 0.83 Declarative memory Paragraph Recall  Immediate −0.33 (−1.33 to 0.67) 0.52 −0.02 (−0.12 to 0.08) 0.71 0.04 (−0.05 to 0.14) 0.38  30-min delay −0.51 (−1.53 to 0.51) 0.32 0.01 (−0.10 to 0.11) 0.86 0.08 (−0.02 to 0.18) 0.11 Motor learning Pursuit Rotor Trial   Time on target, s  1 −5.13 (−9.77 to −0.48) 0.03a 0.10 (−0.40 to 0.59) 0.70 0.18 (−0.30 to 0.65) 0.47   2 −5.78 (−10.62 to −0.93) 0.02a 0.16 (−0.35 to 0.68) 0.53 0.22 (−0.29 to 0.72) 0.40   3 −6.70 (−11.25 to −2.14) 0.004c 0.09 (−0.40 to 0.58) 0.72 0.30 (−0.17 to 0.77) 0.21   4 −2.19 (−7.09 to 2.71) 0.38 0.10 (−0.41 to 0.61) 0.70 0.15 (−0.34 to 0.65) 0.54 Motor Sequence  Learning Test  Total movement  time, s 31.00 (−14.82 to 76.82) 0.18 2.31 (−2.45 to 7.07) 0.34 −1.08 (−5.74 to 3.58) 0.65 Correlations [95% confidence intervals (CIs)] with continuous outcomes were modeled with multiple linear regressions adjusting for age, years of education, WAIS-R score, BMI at baseline, change in BMI, sex and estrogen status, standard deviation of TSH values at all but first visit, baseline LT-4 dose, time on LT-4, time on LT-4 dose, baseline hormone value, and baseline value of outcome. The magnitude of the coefficient indicates the estimated change in the outcome for each 1-unit increase in the study change (end of study – baseline) of fT4 or TSH and a 10-unit increase in the study change of fT3. Correlations (95% CIs) with binary outcomes were modeled with multiple logistic regressions adjusting for baseline values of the hormone and the outcome of interest. Coefficients were transformed to estimate the percentage change in the predicted odds of the measure for a 1-unit increase in the study change (end of study – baseline) of fT4 or TSH, and a 10-unit increase in the study change of fT3. The transformed coefficients are estimates of the risk ratios associated with the 1- or 10-unit increase in the study change of respective hormone level. A positive coefficient indicates that the measure increased with increasing hormone levels, whereas a negative coefficient indicates that the measure decreased with increasing hormone levels. Separate models were run for each hormone. Significant coefficients are shown in bold with corresponding unadjusted P values. For each set of related outcome measures, multiple testing adjustments were applied to all the individual P values from models adjusting for the same hormone type. a P was not significant at the 0.05 level when we applied multiple testing adjustments grouping outcomes from the same test together by Bonferroni correction. b These values were calculated as proportion of subjects for whom each measure was ≥15 (for correct on target) or >0 (for incorrect nontarget), because there were floor effects. c P was still significant at the 0.05 level when we applied multiple testing adjustments grouping outcomes from the same test together by Bonferroni correction. View Large Table 5. Correlations Between Changes in Thyroid Hormone Levels and Cognitive Measures at End of Study fT4 fT3 TSH Test Coefficient P Coefficient P Coefficient P Executive function Letter Cancellation Test  Time, s 1.85 (−5.37 to 9.07) 0.61 0.60 (−0.14 to 1.33) 0.11 −0.09 (−0.80 to 0.62) 0.80  % With no errors 13% (−57% to 182%) 0.80 −3% (−13% to 7%) 0.56 −9% (−20% to 2%) 0.13 Trail Making Test  Time, s 1.04 (−1.04 to 3.13) 0.32 −0.10 (−0.31 to 0.11) 0.34 −0.03 (−0.24 to 0.17) 0.75  ABC time, s −4.14 (−10.33 to 2.05) 0.19 0.12 (−0.52 to 0.75) 0.72 −0.04 (−0.65 to 0.57) 0.89  % With errors −23% (−82% to 177%) 0.70 −5% (−18% to 8%) 0.45 3% (−10% to 14%) 0.66  % With ABC errors −11% (−66% to 121%) 0.81 −8% (−18% to 1%) 0.10 −5% (−14% to 4%) 0.29 Iowa Gambling Test  Net-1 −0.24 (−3.91 to 3.43) 0.90 −0.40 (−0.77 to −0.03) 0.03a 0.09 (−0.27 to 0.46) 0.61  Net-2 −1.73 (−5.31 to 1.84) 0.34 −0.23 (−0.60 to 0.14) 0.21 −0.01 (−0.36 to 0.34) 0.97  Net-3 0.49 (−3.09 to 4.08) 0.79 0.09 (−0.28 to 0.45) 0.65 0.10 (−0.25 to 0.45) 0.58  Net-4 −1.88 (−5.72 to 1.97) 0.34 -0.51 (−0.89 to −0.12) 0.01a 0.001 (−0.374 to 0.377) >0.99  Net-5 −0.39 (−4.25 to 3.48) 0.84 −0.21 (−0.61 to 0.19) 0.30 0.07 (−0.31 to 0.44) 0.73 Working memory N-Back number correct  on target  1-Backb 252% (18% to 1118%) 0.03a 9% (−2% to 23%) 0.14 −7% (−14% to 2%) 0.11  2-Backb 1% (−60% to 159%) >0.99 7% (−3% to 20%) 0.20 −2% (−10% to 8%) 0.70  3-Back 0.17 (−0.76 to 1.10) 0.71 −0.04 (−0.14 to 0.05) 0.36 0.05 (−0.04 to 0.14) 0.31 N-Back number  incorrect nontarget  1-Backb −22% (−77% to 136%) 0.67 0% (−11% to 11%) 0.95 3% (−7% to 14%) 0.51  2-Backb 27% (−52% to 236%) 0.63 3% (−8% to 14%) 0.63 5% (−4% to 14%) 0.29  3-Back −0.28 (−1.07 to 0.51) 0.48 0.05 (−0.04 to 0.13) 0.27 −0.01 (−0.09 to 0.06) 0.74 Subject-Ordered  Pointing errors  6 0.02 (−0.14 to 0.18) 0.80 −0.01 (−0.02 to 0.01) 0.52 0.01 (−0.01 to 0.02) 0.27  8 0.11 (−0.16 to 0.38) 0.42 0.0003 (−0.0280 to 0.0286) 0.98 0.002 (−0.024 to 0.029) 0.88  10 0.10 (−0.14 to 0.35) 0.42 −0.003 (−0.029 to 0.023) 0.82 0.02 (−0.01 to 0.04) 0.18  12 0.05 (−0.28 to 0.39) 0.75 −0.02 (−0.05 to 0.02) 0.27 −0.003 (−0.036 to 0.029) 0.83 Declarative memory Paragraph Recall  Immediate −0.33 (−1.33 to 0.67) 0.52 −0.02 (−0.12 to 0.08) 0.71 0.04 (−0.05 to 0.14) 0.38  30-min delay −0.51 (−1.53 to 0.51) 0.32 0.01 (−0.10 to 0.11) 0.86 0.08 (−0.02 to 0.18) 0.11 Motor learning Pursuit Rotor Trial   Time on target, s  1 −5.13 (−9.77 to −0.48) 0.03a 0.10 (−0.40 to 0.59) 0.70 0.18 (−0.30 to 0.65) 0.47   2 −5.78 (−10.62 to −0.93) 0.02a 0.16 (−0.35 to 0.68) 0.53 0.22 (−0.29 to 0.72) 0.40   3 −6.70 (−11.25 to −2.14) 0.004c 0.09 (−0.40 to 0.58) 0.72 0.30 (−0.17 to 0.77) 0.21   4 −2.19 (−7.09 to 2.71) 0.38 0.10 (−0.41 to 0.61) 0.70 0.15 (−0.34 to 0.65) 0.54 Motor Sequence  Learning Test  Total movement  time, s 31.00 (−14.82 to 76.82) 0.18 2.31 (−2.45 to 7.07) 0.34 −1.08 (−5.74 to 3.58) 0.65 fT4 fT3 TSH Test Coefficient P Coefficient P Coefficient P Executive function Letter Cancellation Test  Time, s 1.85 (−5.37 to 9.07) 0.61 0.60 (−0.14 to 1.33) 0.11 −0.09 (−0.80 to 0.62) 0.80  % With no errors 13% (−57% to 182%) 0.80 −3% (−13% to 7%) 0.56 −9% (−20% to 2%) 0.13 Trail Making Test  Time, s 1.04 (−1.04 to 3.13) 0.32 −0.10 (−0.31 to 0.11) 0.34 −0.03 (−0.24 to 0.17) 0.75  ABC time, s −4.14 (−10.33 to 2.05) 0.19 0.12 (−0.52 to 0.75) 0.72 −0.04 (−0.65 to 0.57) 0.89  % With errors −23% (−82% to 177%) 0.70 −5% (−18% to 8%) 0.45 3% (−10% to 14%) 0.66  % With ABC errors −11% (−66% to 121%) 0.81 −8% (−18% to 1%) 0.10 −5% (−14% to 4%) 0.29 Iowa Gambling Test  Net-1 −0.24 (−3.91 to 3.43) 0.90 −0.40 (−0.77 to −0.03) 0.03a 0.09 (−0.27 to 0.46) 0.61  Net-2 −1.73 (−5.31 to 1.84) 0.34 −0.23 (−0.60 to 0.14) 0.21 −0.01 (−0.36 to 0.34) 0.97  Net-3 0.49 (−3.09 to 4.08) 0.79 0.09 (−0.28 to 0.45) 0.65 0.10 (−0.25 to 0.45) 0.58  Net-4 −1.88 (−5.72 to 1.97) 0.34 -0.51 (−0.89 to −0.12) 0.01a 0.001 (−0.374 to 0.377) >0.99  Net-5 −0.39 (−4.25 to 3.48) 0.84 −0.21 (−0.61 to 0.19) 0.30 0.07 (−0.31 to 0.44) 0.73 Working memory N-Back number correct  on target  1-Backb 252% (18% to 1118%) 0.03a 9% (−2% to 23%) 0.14 −7% (−14% to 2%) 0.11  2-Backb 1% (−60% to 159%) >0.99 7% (−3% to 20%) 0.20 −2% (−10% to 8%) 0.70  3-Back 0.17 (−0.76 to 1.10) 0.71 −0.04 (−0.14 to 0.05) 0.36 0.05 (−0.04 to 0.14) 0.31 N-Back number  incorrect nontarget  1-Backb −22% (−77% to 136%) 0.67 0% (−11% to 11%) 0.95 3% (−7% to 14%) 0.51  2-Backb 27% (−52% to 236%) 0.63 3% (−8% to 14%) 0.63 5% (−4% to 14%) 0.29  3-Back −0.28 (−1.07 to 0.51) 0.48 0.05 (−0.04 to 0.13) 0.27 −0.01 (−0.09 to 0.06) 0.74 Subject-Ordered  Pointing errors  6 0.02 (−0.14 to 0.18) 0.80 −0.01 (−0.02 to 0.01) 0.52 0.01 (−0.01 to 0.02) 0.27  8 0.11 (−0.16 to 0.38) 0.42 0.0003 (−0.0280 to 0.0286) 0.98 0.002 (−0.024 to 0.029) 0.88  10 0.10 (−0.14 to 0.35) 0.42 −0.003 (−0.029 to 0.023) 0.82 0.02 (−0.01 to 0.04) 0.18  12 0.05 (−0.28 to 0.39) 0.75 −0.02 (−0.05 to 0.02) 0.27 −0.003 (−0.036 to 0.029) 0.83 Declarative memory Paragraph Recall  Immediate −0.33 (−1.33 to 0.67) 0.52 −0.02 (−0.12 to 0.08) 0.71 0.04 (−0.05 to 0.14) 0.38  30-min delay −0.51 (−1.53 to 0.51) 0.32 0.01 (−0.10 to 0.11) 0.86 0.08 (−0.02 to 0.18) 0.11 Motor learning Pursuit Rotor Trial   Time on target, s  1 −5.13 (−9.77 to −0.48) 0.03a 0.10 (−0.40 to 0.59) 0.70 0.18 (−0.30 to 0.65) 0.47   2 −5.78 (−10.62 to −0.93) 0.02a 0.16 (−0.35 to 0.68) 0.53 0.22 (−0.29 to 0.72) 0.40   3 −6.70 (−11.25 to −2.14) 0.004c 0.09 (−0.40 to 0.58) 0.72 0.30 (−0.17 to 0.77) 0.21   4 −2.19 (−7.09 to 2.71) 0.38 0.10 (−0.41 to 0.61) 0.70 0.15 (−0.34 to 0.65) 0.54 Motor Sequence  Learning Test  Total movement  time, s 31.00 (−14.82 to 76.82) 0.18 2.31 (−2.45 to 7.07) 0.34 −1.08 (−5.74 to 3.58) 0.65 Correlations [95% confidence intervals (CIs)] with continuous outcomes were modeled with multiple linear regressions adjusting for age, years of education, WAIS-R score, BMI at baseline, change in BMI, sex and estrogen status, standard deviation of TSH values at all but first visit, baseline LT-4 dose, time on LT-4, time on LT-4 dose, baseline hormone value, and baseline value of outcome. The magnitude of the coefficient indicates the estimated change in the outcome for each 1-unit increase in the study change (end of study – baseline) of fT4 or TSH and a 10-unit increase in the study change of fT3. Correlations (95% CIs) with binary outcomes were modeled with multiple logistic regressions adjusting for baseline values of the hormone and the outcome of interest. Coefficients were transformed to estimate the percentage change in the predicted odds of the measure for a 1-unit increase in the study change (end of study – baseline) of fT4 or TSH, and a 10-unit increase in the study change of fT3. The transformed coefficients are estimates of the risk ratios associated with the 1- or 10-unit increase in the study change of respective hormone level. A positive coefficient indicates that the measure increased with increasing hormone levels, whereas a negative coefficient indicates that the measure decreased with increasing hormone levels. Separate models were run for each hormone. Significant coefficients are shown in bold with corresponding unadjusted P values. For each set of related outcome measures, multiple testing adjustments were applied to all the individual P values from models adjusting for the same hormone type. a P was not significant at the 0.05 level when we applied multiple testing adjustments grouping outcomes from the same test together by Bonferroni correction. b These values were calculated as proportion of subjects for whom each measure was ≥15 (for correct on target) or >0 (for incorrect nontarget), because there were floor effects. c P was still significant at the 0.05 level when we applied multiple testing adjustments grouping outcomes from the same test together by Bonferroni correction. View Large Analyses by actual TSH arm at the end of the study By the actual TSH arm at the end of the study, 57 subjects had TSH levels in the low-normal range, 28 in the high-normal range, and 53 in the mildly elevated range (Supplemental Table 1). Subjects did not differ in terms of any baseline demographic, clinical, or thyroid hormone variables. Mean L-T4 doses at the end of the study were progressively lower in the three arms (1.52 ± 0.06, 1.10 ± 0.10, and 0.92 ± 0.08 μg/kg/day, respectively, P < 0.001), whereas mean TSH levels were progressively higher (1.34 ± 0.08, 3.74 ± 0.12, and 9.74 ± 0.63 mU/L, P < 0.001). Mean fT4 and fT3 levels were lower in the mildly elevated TSH arm (1.89 ± 0.06, 1.44 ± 0.08, and 1.35 ± 0.04 ng/dL, respectively, P < 0.001; 206.2 ± 5.6, 196.0 ± 7.7, and 175.2 ± 5.4 pg/dL, P < 0.001). Thirty-three subjects in the low-normal TSH arm (72%), 19 in the high-normal TSH arm (40%), and 44 in the mildly elevated TSH arm (98%) had low fT3 levels (82 to 209 pg/dL). By the actual TSH arm at the end of the study, the SF-36 Bodily Pain subscale was higher in the mildly elevated TSH arm than in the high-normal TSH arm (34% vs 11% high, P = 0.03), and the 1-back number correct on target was lower in the high-normal TSH arm than in the low-normal TSH arm (58% vs 86%, P = 0.002) (Supplemental Tables 2 and 3); neither was significant after correction for multiple testing. There were no other differences between the three arms in health status, mood, or cognitive measures. Subjects’ perceptions of L-T4 doses At the final study visit, subjects were asked whether they thought their L-T4 doses at the end of the study were higher, lower, or unchanged from the start of the study and which of the two doses they preferred. Subjects were not able to accurately ascertain changes in L-T4 doses (P = 0.54) (Supplemental Table 4). However, the majority preferred whichever L-T4 dose they thought was higher (P < 0.001): 68% preferred their dose at the end of the study when they thought their dose had been increased during the study, whereas 96% preferred their dose at the beginning of the study when they thought their dose had been lowered during the study. Effect size calculations We performed effect size calculations by using results for the SF-36 mental component summary and mental health scales, POMS depression scale, 3-back correct on target, and Iowa Gambling Task-5, outcomes affected by mild thyroid dysfunction in our previous studies (7, 29, 30). The necessary sample sizes to achieve 80% power at a 5% level of significance were 659 to 5442 subjects (28) (data not shown). Discussion In this cohort of L-T4 treated subjects, we found little evidence that altering L-T4 doses in a randomized, blinded fashion to achieve TSH levels in the low-normal, high-normal, or mildly elevated range affected health status, mood, or cognitive function over 6 months. After correction for multiple testing, no outcomes were significantly different when the data were analyzed by discrete groups, either as intention to treat or by actual TSH arm achieved at the end of the study. When the data were analyzed by continuous variables, the SF-36 mental health subscale was inversely correlated with TSH levels, but the magnitude of this correlation was small, and there were no other significant findings. Most published studies of subclinical hypothyroid subjects are observational (1), and the most recent and largest failed to find significant quality of life, mood, or cognitive effects (31–36). These studies were often limited by the use of screening cognitive batteries, which are not designed to detect subtle defects in targeted cognitive domains likely to be affected by altered thyroid status. We used sensitive, specific cognitive measures to circumvent these limitations, based on human studies indicating that memory and executive function are preferentially affected, as well as animal studies of thyroid hormone and its receptor distribution in the brain (1). Seven previous studies have assessed effects of L-T4 therapy on symptoms or neurocognitive outcomes in patients with subclinical hypothyroidism (3, 5, 6, 8–11). Three reported improvements in depression or memory after 6 months, but they were open-label (3, 6, 9), although one also reported a neural substrate for thyroid effects in the frontal cortex by functional magnetic resonance imaging (6). Four were blinded and found minor or no effects after 3 to 12 months (5, 8, 10, 11). Our study extends these findings with detailed measures of cognitive areas that have not been intensively studied. With the exception of Stott et al.’s (11) recent study, where the primary outcome was a tiredness score, ours is also the largest interventional study in subclinical hypothyroidism. The literature regarding neurocognitive measures in euthyroid or L-T4-treated subjects has similar limitations. Most studies were observational, with only two small interventional trials of L-T4 therapy in subjects with normal TSH levels treated for 8 to 12 weeks. Neither found effects on hypothyroid symptoms, quality of life, psychological function, or limited measures of cognitive function (2, 4). Our results in a larger group of subjects treated for a longer time period extend these findings. Our findings do not support the idea that lowering TSH levels <2.50 mU/L (37) improves quality of life, mood, or cognitive function. A major strength of our study was our focus on executive function. This cognitive domain has not been extensively studied in thyroid disease, because rodent models do not adequately represent executive functions in humans, and many laboratory measures of executive function are insensitive to real-world scenarios. We included the Iowa Gambling Task because this test of executive function assesses decision making under uncertainty and models real-world behavior (38). L-T4-treated patients often complain of problems in this area, but our results do not corroborate objective changes in executive function when L-T4 doses are altered. Another major strength of our study was the blinded nature of our intervention. When we queried subjects, they could not accurately identify how their L-T4 doses had been altered, but the majority preferred whichever dose they perceived to be the higher dose, confirming an intrinsic bias toward higher L-T4 doses. Studies also indicate that self-knowledge of a thyroid disorder impairs psychological well-being regardless of the TSH level (30, 39), which would bias unblinded studies. We found a high prevalence of low serum fT3 levels at baseline and at the end of the study in all three arms. However, fT3 levels did not correlate with our outcomes. Previous reports have also described a high prevalence of low T3 levels in L-T4-treated subjects (40). However, studies of liothyronine add-on or monotherapy in hypothyroidism have not shown improvements in quality of life, mood, or cognitive outcomes (40). Additional studies have suggested that polymorphisms in deiodinase or brain thyroid hormone transporter genes correlate with psychological scores and response to liothyronine, so subsets of L-T4-treated patients may respond to L-T3 (40). Our study also has limitations. A major limitation was our sample size, and it is possible that we were underpowered to detect small effects. To address this problem, we performed an effect size calculation, which showed that large numbers of subjects (>600, depending on outcome) would need to be studied to reach statistical significance. The small magnitude of our effects suggests that clinically meaningful alterations are unlikely, but it remains possible that subtle effects were missed. We did not include an untreated euthyroid control group, so we cannot ascertain whether our subjects had decrements in quality of life, mood, or cognition at baseline compared with the general population. However, we previously published results of the same tests of quality of life, mood, and cognitive function in L-T4-treated subjects compared with euthyroid control subjects and found mild decrements in the SF-36 (mental component summary, mental health subscale, and vitality subscale) without differences in mood or cognitive function (30). Therefore, we suspect that subjects in the current study had slightly lower quality of life than matched euthyroid subjects. We performed a large number of correlations, although we accounted for this difference in our analyses, and it is possible that some of our minor findings were due to chance. Most of our subjects were women and were younger and slimmer than the U.S. population. Most of our subjects were Caucasian. Our subjects were heterogeneous in terms of thyroid diagnosis and length of L-T4 treatment. We limited our study to 6 months to optimize subject retention, recognizing that this is sufficient time to observe changes in our outcomes. Many of our subjects experienced variations in TSH levels at interim visits that necessitated L-T4 dose adjustments, which we accounted for in our analysis. One-third of our subjects did not achieve target TSH levels, particularly in the high-normal TSH group. To address this limitation, we conducted separate intention-to-treat and actual end-of-study analyses, as well as analyses using changes in TSH and thyroid hormones as continuous variables. These complementary analyses showed similar results, strengthening our conclusions. In addition, we note that regardless of the ultimate TSH attained, L-T4 doses were altered in each arm, consistent with the study design. Because patients often request changes in their L-T4 doses regardless of their TSH levels, an interpretation of our results based on L-T4 dose adjustments is a valuable perspective for clinical practice. We attempted to collect blood samples at a consistent time of day, but this was not always possible. In healthy and L-T4-treated subjects, TSH levels decrease slightly between 07:00 and 09:00 and then remain stable until the evening (41). Finally, we limited our cognitive testing to executive function and memory, although studies do not indicate major effects in other areas (1). In summary, we found no relevant differences in health status, mood, memory, or executive functions in hypothyroid subjects when L-T4 doses were altered in a randomized, blinded fashion to achieve TSH levels in the low-normal, high-normal, or mildly elevated range. Given our limited sample size, additional studies would be helpful, particularly in targeted populations (e.g., symptomatic subjects, subjects with low fT3 levels, or subjects with genetic polymorphisms that affect thyroid hormone action). In the absence of definitive data, reasonable expectations should be discussed with treated hypothyroid patients who report symptoms in these areas and request higher L-T4 doses or alternative thyroid hormone preparations. Abbreviations: Abbreviations: ALS Affective Lability Scale BMI body mass index CV coefficient of variation fT3 free triiodothyronine fT4 free thyroxine L-T4 levothyroxine OHSU Oregon Health & Science University POMS Profile of Mood States SF-36 36-Item Short Form Health Survey TSH thyrotropin WAIS-R Wechsler Adult Intelligence Scale–Revised Acknowledgments We thank the staff of the OHSU Clinical and Translational Research Center for excellent patient care and research support and the Biostatistics & Design Program for data analysis expertise. Financial Support: This work was supported by National Institutes of Health Grants R01 DK075496 (to M.H.S.) and UL1 RR024120 (to OHSU). Clinical Trial Information: ClinicalTrials.gov no. NCT00565864 (registered November 30, 2007). Disclosure Summary: The authors have nothing to disclose. References 1. Samuels MH . Thyroid disease and cognition . Endocrinol Metab Clin North Am . 2014 ; 43 ( 2 ): 529 – 543 . Google Scholar CrossRef Search ADS PubMed 2. Pollock MA , Sturrock A , Marshall K , Davidson KM , Kelly CJ , McMahon AD , McLaren EH . Thyroxine treatment in patients with symptoms of hypothyroidism but thyroid function tests within the reference range: randomised double blind placebo controlled crossover trial . BMJ . 2001 ; 323 ( 7318 ): 891 – 895 . Google Scholar CrossRef Search ADS PubMed 3. Bono G , Fancellu R , Blandini F , Santoro G , Mauri M . Cognitive and affective status in mild hypothyroidism and interactions with L-thyroxine treatment . Acta Neurol Scand . 2004 ; 110 ( 1 ): 59 – 66 . Google Scholar CrossRef Search ADS PubMed 4. Walsh JP , Ward LC , Burke V , Bhagat CI , Shiels L , Henley D , Gillett MJ , Gilbert R , Tanner M , Stuckey BG . Small changes in thyroxine dosage do not produce measurable changes in hypothyroid symptoms, well-being, or quality of life: results of a double-blind, randomized clinical trial . J Clin Endocrinol Metab . 2006 ; 91 ( 7 ): 2624 – 2630 . Google Scholar CrossRef Search ADS PubMed 5. Jorde R , Waterloo K , Storhaug H , Nyrnes A , Sundsfjord J , Jenssen TG . Neuropsychological function and symptoms in subjects with subclinical hypothyroidism and the effect of thyroxine treatment . J Clin Endocrinol Metab . 2006 ; 91 ( 1 ): 145 – 153 . Google Scholar CrossRef Search ADS PubMed 6. Zhu DF , Wang ZX , Zhang DR , Pan ZL , He S , Hu XP , Chen XC , Zhou JN . fMRI revealed neural substrate for reversible working memory dysfunction in subclinical hypothyroidism . Brain . 2006 ; 129 ( Pt 11 ): 2923 – 2930 . Google Scholar CrossRef Search ADS PubMed 7. Samuels MH , Schuff KG , Carlson NE , Carello P , Janowsky JS . Health status, mood, and cognition in experimentally induced subclinical hypothyroidism . J Clin Endocrinol Metab . 2007 ; 92 ( 7 ): 2545 – 2551 . Google Scholar CrossRef Search ADS PubMed 8. Razvi S , Ingoe L , Keeka G , Oates C , McMillan C , Weaver JU . The beneficial effect of L-thyroxine on cardiovascular risk factors, endothelial function, and quality of life in subclinical hypothyroidism: randomized, crossover trial . J Clin Endocrinol Metab . 2007 ; 92 ( 5 ): 1715 – 1723 . Google Scholar CrossRef Search ADS PubMed 9. Correia N , Mullally S , Cooke G , Tun TK , Phelan N , Feeney J , Fitzgibbon M , Boran G , O’Mara S , Gibney J . Evidence for a specific defect in hippocampal memory in overt and subclinical hypothyroidism . J Clin Endocrinol Metab . 2009 ; 94 ( 10 ): 3789 – 3797 . Google Scholar CrossRef Search ADS PubMed 10. Parle J , Roberts L , Wilson S , Pattison H , Roalfe A , Haque MS , Heath C , Sheppard M , Franklyn J , Hobbs FD . A randomized controlled trial of the effect of thyroxine replacement on cognitive function in community-living elderly subjects with subclinical hypothyroidism: the Birmingham Elderly Thyroid study . J Clin Endocrinol Metab . 2010 ; 95 ( 8 ): 3623 – 3632 . Google Scholar CrossRef Search ADS PubMed 11. Stott DJ , Rodondi N , Kearney PM , Ford I , Westendorp RGJ , Mooijaart SP , Sattar N , Aubert CE , Aujesky D , Bauer DC , Baumgartner C , Blum MR , Browne JP , Byrne S , Collet TH , Dekkers OM , den Elzen WPJ , Du Puy RS , Ellis G , Feller M , Floriani C , Hendry K , Hurley C , Jukema JW , Kean S , Kelly M , Krebs D , Langhorne P , McCarthy G , McCarthy V , McConnachie A , McDade M , Messow M , O’Flynn A , O’Riordan D , Poortvliet RKE , Quinn TJ , Russell A , Sinnott C , Smit JWA , Van Dorland HA , Walsh KA , Walsh EK , Watt T , Wilson R , Gussekloo J ; TRUST Study Group . Thyroid hormone therapy for older adults with subclinical hypothyroidism . N Engl J Med . 2017 ; 376 ( 26 ): 2534 – 2544 . Google Scholar CrossRef Search ADS PubMed 12. Spreen O , Strauss EA . General intellectual ability and assessment of premorbid intelligence . In: Spreen O , Strauss EA , eds. A Compendium of Neuropsychological Tests: Administration, Norms, and Commentary . New York, NY : Oxford University Press ; 1998 : 90 – 102 . 13. Billewicz WZ , Chapman RS , Crooks J , Day ME , Gossage J , Wayne E , Young JA . Statistical methods applied to the diagnosis of hypothyroidism . Q J Med . 1969 ; 38 ( 150 ): 255 – 266 . Google Scholar PubMed 14. McMillan C , Bradley C , Razvi S , Weaver J . Evaluation of new measures of the impact of hypothyroidism on quality of life and symptoms: the ThyDQoL and ThySRQ . Value Health . 2008 ; 11 ( 2 ): 285 – 294 . Google Scholar CrossRef Search ADS PubMed 15. Spreen O , Strauss EA . Adaptive behavior and personality . In: Spreen O , Strauss EA , eds. A Compendium of Neuropsychological Tests: Administration, Norms, and Commentary . New York, NY : Oxford University Press ; 1998 : 612 – 616 . 16. Spreen O , Strauss EA . Adaptive behavior and personality . In: Spreen O , Strauss EA , eds. A Compendium of Neuropsychological Tests: Administration, Norms, and Commentary . New York, NY : Oxford University Press ; 1998 : 644 – 646 . 17. Harvey PD , Greenberg BR , Serper MR . The affective lability scales: development, reliability, and validity . J Clin Psychol . 1989 ; 45 ( 5 ): 786 – 793 . Google Scholar CrossRef Search ADS PubMed 18. Byrd DA , Touradji P , Tang MX , Manly JJ . Cancellation test performance in African American, Hispanic, and White elderly . J Int Neuropsychol Soc . 2004 ; 10 ( 3 ): 401 – 411 . Google Scholar CrossRef Search ADS PubMed 19. Lezak MD , Howieson DB , Loring DW . Orientation and attention . In: Lezak MD , Howieson DB , Loring DW , eds. Neuropsychological Assessment . New York, NY : Oxford University Press ; 1995 : 371 – 374 . 20. Singh V , Khan A . Heterogeneity in choices on Iowa Gambling Task: preference for infrequent-high magnitude punishment . Mind Soc . 2009 ; 8 ( 1 ): 43 – 57 . Google Scholar CrossRef Search ADS 21. Lezak MD , Howieson DB , Loring DW . Orientation and attention . In: Lezak MD , Howieson DB , Loring DW , eds. Neuropsychological Assessment . New York, NY : Oxford University Press ; 1996 : 363 – 364 . 22. Spreen O , Strauss EA . Executive functions . In: Spreen O , Strauss EA , eds. A Compendium of Neuropsychological Tests: Administration, Norms, and Commentary . New York, NY : Oxford University Press ; 1998 : 208 – 212 . 23. Lezak M , Howieson DB , Loring DW . Memory I: tests . In: Lezak M , Howieson DB , Loring DW , eds. Neuropsychological Assessment . New York, NY : Oxford University Press ; 1995 : 444 – 450 . 24. van Gorp WG , Altshuler L , Theberge DC , Mintz J . Declarative and procedural memory in bipolar disorder . Biol Psychiatry . 1999 ; 46 ( 4 ): 525 – 531 . Google Scholar CrossRef Search ADS PubMed 25. Spreen O , Strauss EA . Motor tests . In: Spreen O , Strauss EA , eds. A Compendium of Neuropsychological Tests: Administration, Norms and Commentary . New York, NY : Oxford University Press ; 1998 : 577 – 599 . 26. Spencer CA , Hollowell JG , Kazarosyan M , Braverman LE . National Health and Nutrition Examination Survey III thyroid-stimulating hormone (TSH)–thyroperoxidase antibody relationships demonstrate that TSH upper reference limits may be skewed by occult thyroid dysfunction . J Clin Endocrinol Metab . 2007 ; 92 ( 11 ): 4236 – 4240 . Google Scholar CrossRef Search ADS PubMed 27. R Core Team. R: A language and environment for statistical computing. Vienna, Austria: R Foundation for Statistical Computing;. 2015. Available at: http://www.R-project.org/. Accessed 5 June 2017. 28. Cohen J . Statistical Power Analysis for the Behavioral Sciences . Mahwah, NJ : Lawrence Erlbaum Associates ; 1988 . 29. Samuels MH , Kolobova I , Smeraglio A , Niederhausen M , Janowsky JS , Schuff KG . Effects of thyroid function variations within the laboratory reference range on health status, mood, and cognition in levothyroxine-treated subjects . Thyroid . 2016 ; 26 ( 9 ): 1173 – 1184 . Google Scholar CrossRef Search ADS PubMed 30. Samuels MH , Kolobova I , Smeraglio A , Peters D , Janowsky JS , Schuff KG . The effects of levothyroxine replacement or suppressive therapy on health status, mood, and cognition . J Clin Endocrinol Metab . 2014 ; 99 ( 3 ): 843 – 851 . Google Scholar CrossRef Search ADS PubMed 31. Roberts LM , Pattison H , Roalfe A , Franklyn J , Wilson S , Hobbs FD , Parle JV . Is subclinical thyroid dysfunction in the elderly associated with depression or cognitive dysfunction ? Ann Intern Med . 2006 ; 145 ( 8 ): 573 – 581 . Google Scholar CrossRef Search ADS PubMed 32. Wijsman LW , de Craen AJ , Trompet S , Gussekloo J , Stott DJ , Rodondi N , Welsh P , Jukema JW , Westendorp RG , Mooijaart SP . Subclinical thyroid dysfunction and cognitive decline in old age . PLoS One . 2013 ; 8 ( 3 ): e59199 . Google Scholar CrossRef Search ADS PubMed 33. Fjaellegaard K , Kvetny J , Allerup PN , Bech P , Ellervik C . Well-being and depression in individuals with subclinical hypothyroidism and thyroid autoimmunity: a general population study . Nord J Psychiatry . 2015 ; 69 ( 1 ): 73 – 78 . Google Scholar CrossRef Search ADS PubMed 34. van de Ven AC , Netea-Maier RT , de Vegt F , Ross HA , Sweep FC , Kiemeney LA , Hermus AR , den Heijer M . Is there a relationship between fatigue perception and the serum levels of thyrotropin and free thyroxine in euthyroid subjects ? Thyroid . 2012 ; 22 ( 12 ): 1236 – 1243 . Google Scholar CrossRef Search ADS PubMed 35. Klaver EI , van Loon HC , Stienstra R , Links TP , Keers JC , Kema IP , Kobold AC , van der Klauw MM , Wolffenbuttel BH . Thyroid hormone status and health-related quality of life in the LifeLines Cohort Study . Thyroid . 2013 ; 23 ( 9 ): 1066 – 1073 . Google Scholar CrossRef Search ADS PubMed 36. Engum A , Bjøro T , Mykletun A , Dahl AA . An association between depression, anxiety and thyroid function--a clinical fact or an artefact ? Acta Psychiatr Scand . 2002 ; 106 ( 1 ): 27 – 34 . Google Scholar CrossRef Search ADS PubMed 37. Wartofsky L , Dickey RA . The evidence for a narrower thyrotropin reference range is compelling . J Clin Endocrinol Metab . 2005 ; 90 ( 9 ): 5483 – 5488 . Google Scholar CrossRef Search ADS PubMed 38. Winstanley CA , Clark L . Translational models of gambling-related decision-making . Curr Top Behav Neurosci . 2016 ; 28 : 93 – 120 . Google Scholar CrossRef Search ADS PubMed 39. Panicker V , Evans J , Bjøro T , Asvold BO , Dayan CM , Bjerkeset O . A paradoxical difference in relationship between anxiety, depression and thyroid function in subjects on and not on T4: findings from the HUNT study . Clin Endocrinol (Oxf) . 2009 ; 71 ( 4 ): 574 – 580 . Google Scholar CrossRef Search ADS PubMed 40. Jonklaas J , Bianco AC , Bauer AJ , Burman KD , Cappola AR , Celi FS , Cooper DS , Kim BW , Peeters RP , Rosenthal MS , Sawka AM ; American Thyroid Association Task Force on Thyroid Hormone Replacement . Guidelines for the treatment of hypothyroidism: prepared by the American Thyroid Association Task Force on Thyroid Hormone Replacement . Thyroid . 2014 ; 24 ( 12 ): 1670 – 1751 . Google Scholar CrossRef Search ADS PubMed 41. Roelfsema F , Veldhuis JD . Thyrotropin secretion patterns in health and disease . Endocr Rev . 2013 ; 34 ( 5 ): 619 – 657 . Google Scholar CrossRef Search ADS PubMed Copyright © 2018 Endocrine Society http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png Journal of Clinical Endocrinology and Metabolism Oxford University Press

Effects of Altering Levothyroxine (L-T4) Doses on Quality of Life, Mood, and Cognition in L-T4 Treated Subjects

Loading next page...
 
/lp/ou_press/effects-of-altering-levothyroxine-l-t4-doses-on-quality-of-life-mood-YJnAOikJO1
Publisher
Oxford University Press
Copyright
Copyright © 2018 Endocrine Society
ISSN
0021-972X
eISSN
1945-7197
D.O.I.
10.1210/jc.2017-02668
Publisher site
See Article on Publisher Site

Abstract

Abstract Background The brain is a critical target organ for thyroid hormone, but it is unclear whether variations in thyroid function within and near the reference range affect quality of life, mood, or cognition. Methods A total of 138 subjects with levothyroxine (L-T4)-treated hypothyroidism and normal thyrotropin (TSH) levels underwent measures of quality of life (36-Item Short Form Health Survey, Underactive Thyroid-Dependent Quality of Life Questionnaire), mood (Profile of Mood States, Affective Lability Scale), and cognition (executive function, memory). They were then randomly assigned to receive an unchanged, higher, or lower L-T4 dose in double-blind fashion, targeting one of three TSH ranges (0.34 to 2.50, 2.51 to 5.60, or 5.61 to 12.0 mU/L). Doses were adjusted every 6 weeks based on TSH levels. Baseline measures were reassessed at 6 months. Results At the end of the study, by intention to treat, mean L-T4 doses were 1.50 ± 0.07, 1.32 ± 0.07, and 0.78 ± 0.08 μg/kg (P < 0.001), and mean TSH levels were 1.85 ± 0.25, 3.93 ± 0.38, and 9.49 ± 0.80 mU/L (P < 0.001), respectively, in the three arms. There were minor differences in a few outcomes between the three arms, which were no longer significant after correction for multiple comparisons. Subjects could not ascertain how their L-T4 doses had been adjusted (P = 0.55) but preferred L-T4 doses they perceived to be higher (P < 0.001). Conclusions Altering L-T4 doses in hypothyroid subjects to vary TSH levels in and near the reference range does not affect quality of life, mood, or cognition. L-T4-treated subjects prefer perceived higher L-T4 doses despite a lack of objective benefit. Adjusting L-T4 doses in hypothyroid patients based on symptoms in these areas may not result in significant clinical improvement. Overt hypothyroidism interferes with brain functions (1), but effects of variations in thyroid function within and near the reference range are less clear. Observational studies of this issue have been inconsistent, and a few randomized, blinded intervention studies have been negative (1–11). In the absence of consensus, many patients with mild thyrotropin (TSH) elevations are treated with levothyroxine (L-T4) to improve neurocognitive symptoms, and L-T4 doses are often increased to treat persistent symptoms. We recruited hypothyroid subjects treated with L-T4 who underwent testing for health status, mood, and cognitive function. We targeted cognitive domains preferentially affected by mild thyroid dysfunction (memory and executive function) (1). We then adjusted subjects’ L-T4 doses in a blinded fashion over 6 months to achieve one of three TSH ranges (low-normal, high-normal, or mildly elevated), and repeated the tests. We hypothesized that altering TSH levels in these ranges would affect quality of life, mood, memory, and executive function. Experimental Subjects A total of 197 hypothyroid subjects receiving L-T4 monotherapy were recruited from the authors’ clinics, through review of electronic health records, and by flyers. All were diagnosed as adults and had past elevated TSH levels. L-T4 doses were stable for ≥3 months. None had acute or chronic illnesses or were on medications that affect thyroid hormone levels, mood, or cognition. Stable doses of oral contraceptive or estrogen therapy were allowed. Testing was done during the first 14 days after onset of menstrual bleeding or an oral contraceptive cycle in premenopausal women. Materials and Methods Experimental design The protocol was approved by the Oregon Health & Science University (OHSU) Institutional Review Board. Subjects gave written informed consent. Screening visit Subjects were screened for general health, medicines, thyroid status, and mood or cognitive disorders by history, physical examination, and laboratory testing. General intelligence was estimated by the Wechsler Adult Intelligence Scale–Revised (WAIS-R) Vocabulary subtest (12). Run-in visits Subjects taking branded L-T4 with normal screening TSH levels proceeded directly to the baseline visit. Subjects who had abnormal screening TSH levels or were taking generic L-T4 were placed on branded L-T4 and underwent run-in visits every 6 weeks with L-T4 dose adjustments until doses were stable, with normal TSH levels for 3 months. Baseline visit Within 6 weeks of the screening or final run-in visit, subjects returned for a 4-hour baseline visit. Subjects refrained from taking their L-T4 dose that morning. Serum TSH, free thyroxine (fT4), and free triiodothyronine (fT3) levels were obtained. Subjects self-completed the following validated surveys: Billewicz scale of hypothyroid-related symptoms (13) Underactive Thyroid-Dependent Quality of Life Questionnaire, which measures the impact of hypothyroidism on quality of life (14) 36-Item Short Form Health Survey (SF-36), a general health questionnaire (15) Profile of Mood States (POMS), a mood questionnaire (16) Affective Lability Scale, where subjects rate the tendency of their moods to fluctuate (17) Cognitive tests were administered by a single experienced research assistant. Executive function Attention and concentration Letter Cancellation Test. The subject was given a sheet of paper with 6 lines of 52 letters in random sequence and instructed to circle two specified target letters as quickly as possible. The score was the number of errors and time needed (18). Cognitive flexibility Trail Making Test. The subject connected circles on a sheet of paper as quickly as possible. In Part A, the subject drew lines to connect numbered circles in ascending order. In Part B, the subject drew lines to connect circles in ascending order, alternating between numbers and letters. The score was the number of errors and time needed (19). Decision making Iowa Gambling Task. Four decks of cards were shown face down on a computer screen. The subject chose cards from any deck, resulting in the gain or loss of money. The subject was unaware that two decks were advantageous (small gains, smaller losses), and two were disadvantageous (large gains, larger losses). The subject’s choices were classified as advantageous (X) or disadvantageous (Y), with a net score of X − Y, over 5 trials of 100 cards each (20). Working memory N-Back test. A series of letters was presented one at a time on a computer screen. Subjects responded each time a letter appeared that they had seen on the previous screen (1-back). The task was repeated with intervening letters imposed while the subjects had to hold in mind letters that had appeared 2-back and then 3-back. The score was the total number correct on target and the total number incorrect nontarget for each condition (21). Subject-Ordered Pointing. Subjects viewed a series of computer screens that presented abstract drawings (6, 8, 10, or 12 per screen). Each screen in a set showed the same array of drawings but in a different spatial arrangement. The subject indicated one drawing per screen, avoiding the same drawing on subsequent screens. Subjects erred when they chose a drawing that had been previously chosen. Each set was repeated three times. The score was the total number of errors across each screen set (22). Declarative memory Paragraph Recall (verbal memory) Subjects were read a brief story and verbally recalled it immediately and after 30 minutes. The score was the total number of story elements recalled at each interval (23). Motor learning Pursuit Rotor Subjects held a photosensitive wand to maintain contact with a 2-cm light disk rotating on a turntable (Lafayette Instrument Company, Lafayette, IN). Two blocks of eight 20-second trials were administered, with a 20-second rest after each trial and a 60-second rest period after four trials. After a 30-minute interval, the two blocks were repeated. The score was the mean total time the stylus remained on target (24). Motor Sequence Learning Test The subject memorized two keypress sequences, each associated with a letter of the alphabet. As soon as that letter appeared on the computer screen, the subject performed the appropriate sequence as quickly as possible. Subjects performed 10 blocks of 18 trials each. The score was the total movement time (time from character presentation to completion of the sequence) (25). Randomization Immediately after the baseline visit, subjects were randomly assigned to one of three arms: low-normal TSH (0.34 to 2.50 mU/L), high-normal TSH (2.51 to 5.60 mU/L), or mildly elevated TSH (5.60 to 12.0 mU/L). These were based on the OHSU TSH assay reference range, recent debate over restricting the upper limit to 2.50 mU/L (26), and our intention to restrict elevated TSH levels to the subclinical hypothyroid range. Randomization was stratified by whether the subject’s TSH was low- or high-normal. L-T4 dosing Taking into account baseline TSH levels, the dispensing physician (K.G.S.) initially determined whether subjects should continue their usual L-T4 dose or receive a different dose to achieve the assigned target TSH ranges. If a different dose was indicated, the subject’s usual dose was altered by 25 to 50 μg, depending on the difference between the initial and target TSH levels. The principal investigator (M.H.S.), research assistants, and subjects were unaware of the treatment assignment or L-T4 doses. The OHSU research pharmacy dispensed 6-week supplies of L-T4 pills in opaque gel capsules to maintain blinding. Interim visits At 6, 12, and 18 weeks, subjects returned for brief visits. The principal investigator assessed clinical effects and determined whether the subject could comfortably continue the study. TSH levels from these visits were reviewed by K.G.S., who adjusted L-T4 doses if the interim TSH level was not in the target range. L-T4 doses were adjusted by 12.5 to 50 μg depending on the difference between the interim and target TSH levels, and the research pharmacy dispensed new 6-week supplies. Additional interim visits were allowed if the TSH level was not in the target range at 18 weeks. Once the TSH was in the target range, no further interim visits were scheduled, and the subject proceeded to the end-of-study visit. End-of-study visit Approximately 6 weeks after the final interim visit, baseline measurements were repeated. At this visit, TSH, fT4, and fT3 levels were measured, and this TSH level was subsequently used to assign subjects to actual end-of-study TSH arms for the purposes of data analysis. The subject was then placed back on his or her usual L-T4 dose, or a dose that led to better TSH control during the study, per subject preference. Analytic methods TSH was measured by immunochemiluminometric assay (Beckman Coulter): functional sensitivity 0.02 mU/L, normal range 0.34 to 5.60 mU/L, and interassay coefficient of variation (CV) 5% at 0.70 mU/L. fT4 was measured by direct equilibrium dialysis (Quest Diagnostics): sensitivity 0.08 ng/dL, normal range 0.8 to 2.7 ng/dL, and interassay CV 6.8% at 0.3 ng/dL and 1.6% at 3.8 ng/dL. FT3 was measured by tracer dialysis (Quest Diagnostics): sensitivity 25 pg/dL, normal range 210 to 440 pg/dL, and interassay CV 4%. TSH levels were measured at the time of testing, with stable assay characteristics during the study. fT4 and fT3 levels were batched and analyzed at the end of the study. All samples were run in duplicate. Statistical methods Differences between arms for continuous measures were analyzed with multiple linear regression models adjusted for age, sex and estrogen status, years of education, WAIS-R score, baseline body mass index (BMI), change in BMI, baseline TSH (low- vs high-normal), standard deviation of TSH values at interim and last visits, baseline LT-4 dose (μg/kg), time on LT-4, time on LT-4 dose, and baseline value of the outcome variable. Binary measures were analyzed with multiple logistic regression models that adjusted for baseline TSH (low- vs high-normal) and baseline values of the outcomes because of the more limited nature of the binary data. For outcomes with significant differences between arms, the Tukey multiple comparison procedure was used to determine which arms were significantly different and adjust P values for pairwise differences between arms. Because not all outcomes were independent, multiple testing P value adjustments were made for groups of outcomes that included a significant outcome. These outcomes included the SF-36 summary and subscales (10 outcomes total), the POMS subscales (6 outcomes total), N-Back (correct and incorrect variables for 1-back, 2-back, 3-back; 6 outcomes total), and Letter Cancellation Test (time and error, 2 outcomes total). Analyses were conducted as intention-to-treat and by the actual TSH arms subjects achieved at the end of the study. We also examined relationships between outcomes and TSH, fT4, or fT3 by using the same regression analyses but substituting, in separate models, the selected hormone for the categorical arms variable. All analyses were conducted in R version 3.3.2 (R Foundation for Statistical Computing) (27). Results Demographic, clinical, and thyroid function parameters Figure 1 provides a flowchart of the study design and subject enrollment. Of the 197 subjects initially screened, 24 were excluded because of abnormal laboratory tests [low-density lipoproteins >160 mg/dL (n = 11), glucose >120 mg/dL (n = 2), elevated serum calcium (n = 1)], abnormal electrocardiogram (n = 3), TSH out of range (n = 5), or medical issues (n = 2). Fifty subjects were taking branded L-T4 and had normal screening TSH levels, and they proceeded directly to the baseline visit. One hundred twenty-three subjects were taking generic L-T4 or had abnormal screening TSH levels and proceeded to the run-in. One hundred fifty-one subjects completed the baseline visit. Twenty-two subjects withdrew during the run-in [personal issues (n = 17), started other medications (n = 2), started weight loss diet (n = 1), medical issues (n = 2)]. Thirteen subjects withdrew before the final visit [personal issues (n = 5), medical issues (n = 6), pregnancy (n = 1), started weight loss diet (n = 1)]. Seven of these withdrew before the 6-week interim visit, 4 before the 12-week interim visit, and 2 before the 18-week interim visit. Subjects who were excluded or withdrew were not different from the study population in demographic or clinical attributes. Figure 1. View largeDownload slide Flowchart of study design and enrollment. Figure 1. View largeDownload slide Flowchart of study design and enrollment. A total of 138 subjects completed the study (125 female, 13 male). They were aged 27 to 70 years and were receiving L-T4 for primary hypothyroidism (n = 112), hypothyroidism after iodine-131 therapy for Graves disease (n = 17), postpartum thyroiditis leading to permanent hypothyroidism (n = 3), or thyroid surgery (n = 6). They had received L-T4 for 5 months to 50 years (mean 12 years). Mean time on current L-T4 dose was 1.6 years. Baseline data from these subjects have been published (28). During the run-in, 92 subjects (67%) switched from generic to branded L-T4, and 36 (26%) needed L-T4 dose adjustment. Percentages of subjects needing a run-in or dose adjustment did not differ between the three arms (P = 0.50). At baseline, 87 subjects (63%) had low-normal TSH and 51 (37%) had high-normal TSH levels. Nineteen subjects (14%) did not need L-T4 dose adjustments at interim visits, and 119 (86%) needed 1 to 5 additional dose adjustments (mean 2.1). Forty-five subjects (33%) did not achieve their intended target TSH ranges (17%, 64%, and 16% in the low-normal, high-normal, and mildly elevated TSH arms, respectively). It was particularly difficult to maintain subjects in the high-normal TSH arm, because small changes in TSH levels near the lower or upper cutoffs of this arm moved subjects into one of the other two arms. For this reason, we conducted two separate analyses, one as intention-to-treat by randomized arm and one based on actual TSH levels at the end-of-study visit. Results are presented for the intention to treat analysis first, followed by the actual end arm analysis. By intention to treat, subjects in the three arms did not differ in age, WAIS-R score, years in school, sex, estrogen status, ethnicity, BMI, or duration at current L-T4 dose (Table 1). Duration of L-T4 treatment was longer in the high-normal TSH arm (P < 0.001). Mean L-T4 doses at the end of the study were progressively lower in the three arms (1.50 ± 0.07, 1.32 ± 0.07, and 0.78 ± 0.08 μg/kg/day, respectively, P < 0.001), whereas mean TSH levels were progressively higher (1.85 ± 0.25, 3.93 ± 0.38, and 9.49 ± 0.80 mU/L, P < 0.001). Mean fT4 levels were lower in the mildly elevated TSH arm (1.79 ± 0.06, 1.64 ± 0.07, and 1.34 ± 0.05 ng/dL, respectively, P < 0.001), whereas mean fT3 levels were not significantly different between the three arms (201.4 ± 6.0, 191.4 ± 6.2, and 184.1 ± 6.6 pg/dL, P = 0.15). Seventy-two subjects (52%) had low baseline fT3 levels (118 to 209 pg/dL). At the end of the study, 28 subjects in the low-normal TSH arm (61%), 34 in the high-normal TSH arm (72%), and 34 in the mildly elevated TSH arm (76%) had low fT3 levels (82 to 209 pg/dL). Table 1. Clinical Parameters and Thyroid Function Tests at Baseline and End of Study, Analyzed as Intention to Treat End of Study Baseline Arm 1 Low Normal TSH Arm 2 High Normal TSH Arm 3 Mildly Elevated TSH P No. of subjects 138 46 47 45 Age, y 49.2 ± 1 49.5 ± 1.7 50.9 ± 1.8 49.3 ± 1.6 0.77 WAIS-R scorea 10.9 ± 0.2 10.8 ± 0.3 11.0 ± 0.3 10.8 ± 0.3 0.88 Years in schoola 15.9 ± 0.3 15.6 ± 0.5 16.1 ± 0.4 15.9 ± 0.4 0.74 Sexa 91% Female 41 (89.1%) 41 (87.2%) 43 (95.6%) 0.45 9% Male 5 (10.9%) 6 (12.8%) 2 (4.4%) Estrogen statusa 9% Male 5 (10.9%) 6 (12.8%) 2 (4.4%) 0.50 39% Prenone 19 (41.3%) 15 (31.9%) 20 (44.4%) 9% Preon 4 (8.7%) 5 (10.6%) 4 (8.9%) 38% Postnone 18 (39.1%) 19 (40.4%) 15 (33.3%) 4% Poston 0 (0%) 2 (4.3%) 4 (8.9%) Ethnicitya 92% White 44 (95.7%) 41 (87.2%) 42 (93.3%) 0.35 8% Other 2 (4.3%) 6 (12.8%) 3 (6.7%) BMI, kg/m2 27.8 ± 0.5 28.6 ± 0.8 27.3 ± 0.8 27.6 ± 1.0 0.58 L-T4 duration of treatment, ya 11.9 ± 0.8 9.9 ± 1.3 15.9 ± 1.7 9.7 ± 0.9 <0.001b,c L-T4 duration at current dose, ya 1.63 ± 0.19 1.75 ± 0.35 1.50 ± 0.18 1.63 ± 0.40 0.86 L-T4 dose, μg/kg 1.44 ± 0.04 1.50 ± 0.07 1.32 ± 0.07 0.78 ± 0.08 <0.001c,d L-T4 dose change, μg/kg 0.14 ± 0.02 −0.21 ± 0.03 −0.64 ± 0.05 <0.001b,c,d TSH, mu/L 2.21 ± 0.13 1.85 ± 0.25 3.93 ± 0.38 9.49 ± 0.80 <0.001b,c,d TSH change, mu/L −0.18 ± 0.33 1.60 ± 0.42 7.23 ± 0.86 <0.001c,d Free T4, ng/dL 1.67 ± 0.03 1.79 ± 0.06 1.64 ± 0.07 1.34 ± 0.05 <0.001c,d Free T4, ng/dL change 0.13 ± 0.07 −0.04 ± 0.07 −0.32 ± 0.06 <0.001c,d Free T3, pg/dL 214 ± 4.2 201.4 ± 6.0 191.1 ± 6.2 184.1 ± 6.6 0.15 Free T3, pg/dL change −18.9 ± 9.3 −14.4 ± 6.9 −32.4 ± 7.9 0.27 End of Study Baseline Arm 1 Low Normal TSH Arm 2 High Normal TSH Arm 3 Mildly Elevated TSH P No. of subjects 138 46 47 45 Age, y 49.2 ± 1 49.5 ± 1.7 50.9 ± 1.8 49.3 ± 1.6 0.77 WAIS-R scorea 10.9 ± 0.2 10.8 ± 0.3 11.0 ± 0.3 10.8 ± 0.3 0.88 Years in schoola 15.9 ± 0.3 15.6 ± 0.5 16.1 ± 0.4 15.9 ± 0.4 0.74 Sexa 91% Female 41 (89.1%) 41 (87.2%) 43 (95.6%) 0.45 9% Male 5 (10.9%) 6 (12.8%) 2 (4.4%) Estrogen statusa 9% Male 5 (10.9%) 6 (12.8%) 2 (4.4%) 0.50 39% Prenone 19 (41.3%) 15 (31.9%) 20 (44.4%) 9% Preon 4 (8.7%) 5 (10.6%) 4 (8.9%) 38% Postnone 18 (39.1%) 19 (40.4%) 15 (33.3%) 4% Poston 0 (0%) 2 (4.3%) 4 (8.9%) Ethnicitya 92% White 44 (95.7%) 41 (87.2%) 42 (93.3%) 0.35 8% Other 2 (4.3%) 6 (12.8%) 3 (6.7%) BMI, kg/m2 27.8 ± 0.5 28.6 ± 0.8 27.3 ± 0.8 27.6 ± 1.0 0.58 L-T4 duration of treatment, ya 11.9 ± 0.8 9.9 ± 1.3 15.9 ± 1.7 9.7 ± 0.9 <0.001b,c L-T4 duration at current dose, ya 1.63 ± 0.19 1.75 ± 0.35 1.50 ± 0.18 1.63 ± 0.40 0.86 L-T4 dose, μg/kg 1.44 ± 0.04 1.50 ± 0.07 1.32 ± 0.07 0.78 ± 0.08 <0.001c,d L-T4 dose change, μg/kg 0.14 ± 0.02 −0.21 ± 0.03 −0.64 ± 0.05 <0.001b,c,d TSH, mu/L 2.21 ± 0.13 1.85 ± 0.25 3.93 ± 0.38 9.49 ± 0.80 <0.001b,c,d TSH change, mu/L −0.18 ± 0.33 1.60 ± 0.42 7.23 ± 0.86 <0.001c,d Free T4, ng/dL 1.67 ± 0.03 1.79 ± 0.06 1.64 ± 0.07 1.34 ± 0.05 <0.001c,d Free T4, ng/dL change 0.13 ± 0.07 −0.04 ± 0.07 −0.32 ± 0.06 <0.001c,d Free T3, pg/dL 214 ± 4.2 201.4 ± 6.0 191.1 ± 6.2 184.1 ± 6.6 0.15 Free T3, pg/dL change −18.9 ± 9.3 −14.4 ± 6.9 −32.4 ± 7.9 0.27 Values are mean ± standard error of the mean. Change variables represent the differences between end of study and baseline for each arm. Differences between arms were tested with analysis of variance, and follow-up post hoc Tukey multiple comparisons were used to determine which arms were significantly different at the 5% level. Abbreviations: Postnone, postmenopausal, no hormone treatment; Poston, postmenopausal on hormone treatment; Prenone, premenopausal, no hormone treatment; Preon, premenopausal on hormone treatment. a Values are at baseline for each arm. b Arm 1 vs Arm 2. c Arm 2 vs Arm 3. d Arm 1 vs Arm 3. View Large Table 1. Clinical Parameters and Thyroid Function Tests at Baseline and End of Study, Analyzed as Intention to Treat End of Study Baseline Arm 1 Low Normal TSH Arm 2 High Normal TSH Arm 3 Mildly Elevated TSH P No. of subjects 138 46 47 45 Age, y 49.2 ± 1 49.5 ± 1.7 50.9 ± 1.8 49.3 ± 1.6 0.77 WAIS-R scorea 10.9 ± 0.2 10.8 ± 0.3 11.0 ± 0.3 10.8 ± 0.3 0.88 Years in schoola 15.9 ± 0.3 15.6 ± 0.5 16.1 ± 0.4 15.9 ± 0.4 0.74 Sexa 91% Female 41 (89.1%) 41 (87.2%) 43 (95.6%) 0.45 9% Male 5 (10.9%) 6 (12.8%) 2 (4.4%) Estrogen statusa 9% Male 5 (10.9%) 6 (12.8%) 2 (4.4%) 0.50 39% Prenone 19 (41.3%) 15 (31.9%) 20 (44.4%) 9% Preon 4 (8.7%) 5 (10.6%) 4 (8.9%) 38% Postnone 18 (39.1%) 19 (40.4%) 15 (33.3%) 4% Poston 0 (0%) 2 (4.3%) 4 (8.9%) Ethnicitya 92% White 44 (95.7%) 41 (87.2%) 42 (93.3%) 0.35 8% Other 2 (4.3%) 6 (12.8%) 3 (6.7%) BMI, kg/m2 27.8 ± 0.5 28.6 ± 0.8 27.3 ± 0.8 27.6 ± 1.0 0.58 L-T4 duration of treatment, ya 11.9 ± 0.8 9.9 ± 1.3 15.9 ± 1.7 9.7 ± 0.9 <0.001b,c L-T4 duration at current dose, ya 1.63 ± 0.19 1.75 ± 0.35 1.50 ± 0.18 1.63 ± 0.40 0.86 L-T4 dose, μg/kg 1.44 ± 0.04 1.50 ± 0.07 1.32 ± 0.07 0.78 ± 0.08 <0.001c,d L-T4 dose change, μg/kg 0.14 ± 0.02 −0.21 ± 0.03 −0.64 ± 0.05 <0.001b,c,d TSH, mu/L 2.21 ± 0.13 1.85 ± 0.25 3.93 ± 0.38 9.49 ± 0.80 <0.001b,c,d TSH change, mu/L −0.18 ± 0.33 1.60 ± 0.42 7.23 ± 0.86 <0.001c,d Free T4, ng/dL 1.67 ± 0.03 1.79 ± 0.06 1.64 ± 0.07 1.34 ± 0.05 <0.001c,d Free T4, ng/dL change 0.13 ± 0.07 −0.04 ± 0.07 −0.32 ± 0.06 <0.001c,d Free T3, pg/dL 214 ± 4.2 201.4 ± 6.0 191.1 ± 6.2 184.1 ± 6.6 0.15 Free T3, pg/dL change −18.9 ± 9.3 −14.4 ± 6.9 −32.4 ± 7.9 0.27 End of Study Baseline Arm 1 Low Normal TSH Arm 2 High Normal TSH Arm 3 Mildly Elevated TSH P No. of subjects 138 46 47 45 Age, y 49.2 ± 1 49.5 ± 1.7 50.9 ± 1.8 49.3 ± 1.6 0.77 WAIS-R scorea 10.9 ± 0.2 10.8 ± 0.3 11.0 ± 0.3 10.8 ± 0.3 0.88 Years in schoola 15.9 ± 0.3 15.6 ± 0.5 16.1 ± 0.4 15.9 ± 0.4 0.74 Sexa 91% Female 41 (89.1%) 41 (87.2%) 43 (95.6%) 0.45 9% Male 5 (10.9%) 6 (12.8%) 2 (4.4%) Estrogen statusa 9% Male 5 (10.9%) 6 (12.8%) 2 (4.4%) 0.50 39% Prenone 19 (41.3%) 15 (31.9%) 20 (44.4%) 9% Preon 4 (8.7%) 5 (10.6%) 4 (8.9%) 38% Postnone 18 (39.1%) 19 (40.4%) 15 (33.3%) 4% Poston 0 (0%) 2 (4.3%) 4 (8.9%) Ethnicitya 92% White 44 (95.7%) 41 (87.2%) 42 (93.3%) 0.35 8% Other 2 (4.3%) 6 (12.8%) 3 (6.7%) BMI, kg/m2 27.8 ± 0.5 28.6 ± 0.8 27.3 ± 0.8 27.6 ± 1.0 0.58 L-T4 duration of treatment, ya 11.9 ± 0.8 9.9 ± 1.3 15.9 ± 1.7 9.7 ± 0.9 <0.001b,c L-T4 duration at current dose, ya 1.63 ± 0.19 1.75 ± 0.35 1.50 ± 0.18 1.63 ± 0.40 0.86 L-T4 dose, μg/kg 1.44 ± 0.04 1.50 ± 0.07 1.32 ± 0.07 0.78 ± 0.08 <0.001c,d L-T4 dose change, μg/kg 0.14 ± 0.02 −0.21 ± 0.03 −0.64 ± 0.05 <0.001b,c,d TSH, mu/L 2.21 ± 0.13 1.85 ± 0.25 3.93 ± 0.38 9.49 ± 0.80 <0.001b,c,d TSH change, mu/L −0.18 ± 0.33 1.60 ± 0.42 7.23 ± 0.86 <0.001c,d Free T4, ng/dL 1.67 ± 0.03 1.79 ± 0.06 1.64 ± 0.07 1.34 ± 0.05 <0.001c,d Free T4, ng/dL change 0.13 ± 0.07 −0.04 ± 0.07 −0.32 ± 0.06 <0.001c,d Free T3, pg/dL 214 ± 4.2 201.4 ± 6.0 191.1 ± 6.2 184.1 ± 6.6 0.15 Free T3, pg/dL change −18.9 ± 9.3 −14.4 ± 6.9 −32.4 ± 7.9 0.27 Values are mean ± standard error of the mean. Change variables represent the differences between end of study and baseline for each arm. Differences between arms were tested with analysis of variance, and follow-up post hoc Tukey multiple comparisons were used to determine which arms were significantly different at the 5% level. Abbreviations: Postnone, postmenopausal, no hormone treatment; Poston, postmenopausal on hormone treatment; Prenone, premenopausal, no hormone treatment; Preon, premenopausal on hormone treatment. a Values are at baseline for each arm. b Arm 1 vs Arm 2. c Arm 2 vs Arm 3. d Arm 1 vs Arm 3. View Large Health status and mood by intention to treat At the end of the study, SF-36 Physical Functioning and POMS anger subscales were higher in the high-normal TSH compared with the low-normal TSH arm (49% vs 26% high, P = 0.03; 4.9 ± 0.7 vs 3.6 ± 0.6, P = 0.03), but these differences were not significant after correction for multiple testing. There were no other differences between the three arms in health status or mood measures (Table 2). Analyzing TSH, fT4, and fT3 as continuous variables, the SF-36 Mental Health subscale decreased by 0.33 point for each 1-mU/L increase in TSH (P = 0.05). There were no significant correlations between TSH, fT4, or fT3 and other health status or mood measures (Table 3). Table 2. End-of-Study Health Status and Mood Measures for Each Arm, Analyzed by Intention to Treat Measure Arm 1 Low-Normal TSH Arm 2 High-Normal TSH Arm 3 Mildly Elevated TSH P Billewicz  Billewicz Score 2.8 ± 0.4 4.0 ± 0.5 4.1 ± 0.4 0.38 Thyroid Disease Questionnaire   Thyroid Disease Questionnaire weighted average −1.6 ± 0.2 −1.4 ± 0.2 −1.9 ± 0.3 0.53 SF-36  Mental component summary 41.0 ± 0.9 39.8 ± 1.1 39.4 ± 1.0 0.66  Physical component summary 48.5 ± 0.7 50.6 ± 0.7 49.2 ± 0.8 0.28  General health 62.2 ± 1.6 64.3 ± 1.7 62.6 ± 2.1 0.86  Mental health 68.6 ± 1.6 66.0 ± 1.7 65.1 ± 1.7 0.12  Vitality 45.6 ± 2.7 50.5 ± 2.5 44.6 ± 2.7 0.36  Bodily pain (BP)a 21% high 23% high 33% high 0.69  Physical functioning (PF)a 26% high 49% high 31% high 0.03b  Role physical (RP)a 62% high 79% high 55% high 0.08  Social functioning (SF)a 62% high 53% high 52% high 0.62  Role emotional (RE)a 74% high 74% high 62% high 0.43 POMSc  Anger 3.6 ± 0.6 4.9 ± 0.7 4.3 ± 0.6 0.03b  Confusion 6.1 ± 0.5 6.6 ± 0.5 6.0 ± 0.4 0.87  Depression 4.4 ± 0.9 5.2 ± 0.9 5.4 ± 0.8 0.83  Fatigue 7.4 ± 0.8 6.7 ± 0.9 6.9 ± 0.8 0.42  Tension 5.2 ± 0.5 6.7 ± 0.5 7.2 ± 0.7 0.22  Vigor 14.9 ± 0.9 16.8 ± 0.9 14.8 ± 0.9 0.83 Affective Lability Score  Bipolar 0.6 ± 0.1 0.7 ± 0.1 0.7 ± 0.1 0.42  Depression 0.9 ± 0.1 0.9 ± 0.1 1.0 ± 0.1 0.69  Elation 0.7 ± 0.1 0.8 ± 0.1 0.8 ± 0.1 0.89  Angerd 61% score >0 57% score >0 62% score >0 0.67  Anxietyd 67% score >0 83% score >0 84% score >0 0.21  Anxiety Depressiond 63% score >0 66% score >0 76% score >0 0.58 Measure Arm 1 Low-Normal TSH Arm 2 High-Normal TSH Arm 3 Mildly Elevated TSH P Billewicz  Billewicz Score 2.8 ± 0.4 4.0 ± 0.5 4.1 ± 0.4 0.38 Thyroid Disease Questionnaire   Thyroid Disease Questionnaire weighted average −1.6 ± 0.2 −1.4 ± 0.2 −1.9 ± 0.3 0.53 SF-36  Mental component summary 41.0 ± 0.9 39.8 ± 1.1 39.4 ± 1.0 0.66  Physical component summary 48.5 ± 0.7 50.6 ± 0.7 49.2 ± 0.8 0.28  General health 62.2 ± 1.6 64.3 ± 1.7 62.6 ± 2.1 0.86  Mental health 68.6 ± 1.6 66.0 ± 1.7 65.1 ± 1.7 0.12  Vitality 45.6 ± 2.7 50.5 ± 2.5 44.6 ± 2.7 0.36  Bodily pain (BP)a 21% high 23% high 33% high 0.69  Physical functioning (PF)a 26% high 49% high 31% high 0.03b  Role physical (RP)a 62% high 79% high 55% high 0.08  Social functioning (SF)a 62% high 53% high 52% high 0.62  Role emotional (RE)a 74% high 74% high 62% high 0.43 POMSc  Anger 3.6 ± 0.6 4.9 ± 0.7 4.3 ± 0.6 0.03b  Confusion 6.1 ± 0.5 6.6 ± 0.5 6.0 ± 0.4 0.87  Depression 4.4 ± 0.9 5.2 ± 0.9 5.4 ± 0.8 0.83  Fatigue 7.4 ± 0.8 6.7 ± 0.9 6.9 ± 0.8 0.42  Tension 5.2 ± 0.5 6.7 ± 0.5 7.2 ± 0.7 0.22  Vigor 14.9 ± 0.9 16.8 ± 0.9 14.8 ± 0.9 0.83 Affective Lability Score  Bipolar 0.6 ± 0.1 0.7 ± 0.1 0.7 ± 0.1 0.42  Depression 0.9 ± 0.1 0.9 ± 0.1 1.0 ± 0.1 0.69  Elation 0.7 ± 0.1 0.8 ± 0.1 0.8 ± 0.1 0.89  Angerd 61% score >0 57% score >0 62% score >0 0.67  Anxietyd 67% score >0 83% score >0 84% score >0 0.21  Anxiety Depressiond 63% score >0 66% score >0 76% score >0 0.58 Values are mean ± standard error of the mean. Significant differences between arms are shown in bold. P values for continuous outcomes were adjusted for age, years of education, WAIS-R score, BMI at baseline, change in BMI, sex and estrogen status, baseline TSH (low-normal vs high-normal), standard deviation of TSH values at all but first visit, baseline LT-4 dose, time on LT-4, time on LT-4 dose, and baseline value of outcome. P values for binary outcomes were adjusted for baseline TSH (low-normal vs high-normal) and baseline value of outcome. SF-36 and POMS outcomes were grouped, and multiple testing P value adjustments were made for these outcomes. a For BP, PF, RP, SF, and RE, the distributions within each group were highly skewed. We used the highest observed values of the skewed SF-36 subscales as the cutoff points for producing a dichotomous measure. The highest observed values were BP 75, PF 100, RP 50, SF 80, and RE 50. b Arms 1 and 2 were significantly different at the 5% level based on follow-up pairwise comparisons of all three arms based on Tukey multiple comparisons. However, these P values were not significant when we grouped outcomes from the same test together and applied multiple testing adjustments by using a Bonferroni correction to all individual Tukey adjusted P values comparing the three arms. c POMS values (except for the vigor subscale) were natural log-transformed before analysis because the raw data were skewed. All values were increased by 1 before the transformation because of the presence of zeros. d These scores were compared as the proportion positive between the groups, because the measures were skewed and contained a large proportion of zeros. View Large Table 2. End-of-Study Health Status and Mood Measures for Each Arm, Analyzed by Intention to Treat Measure Arm 1 Low-Normal TSH Arm 2 High-Normal TSH Arm 3 Mildly Elevated TSH P Billewicz  Billewicz Score 2.8 ± 0.4 4.0 ± 0.5 4.1 ± 0.4 0.38 Thyroid Disease Questionnaire   Thyroid Disease Questionnaire weighted average −1.6 ± 0.2 −1.4 ± 0.2 −1.9 ± 0.3 0.53 SF-36  Mental component summary 41.0 ± 0.9 39.8 ± 1.1 39.4 ± 1.0 0.66  Physical component summary 48.5 ± 0.7 50.6 ± 0.7 49.2 ± 0.8 0.28  General health 62.2 ± 1.6 64.3 ± 1.7 62.6 ± 2.1 0.86  Mental health 68.6 ± 1.6 66.0 ± 1.7 65.1 ± 1.7 0.12  Vitality 45.6 ± 2.7 50.5 ± 2.5 44.6 ± 2.7 0.36  Bodily pain (BP)a 21% high 23% high 33% high 0.69  Physical functioning (PF)a 26% high 49% high 31% high 0.03b  Role physical (RP)a 62% high 79% high 55% high 0.08  Social functioning (SF)a 62% high 53% high 52% high 0.62  Role emotional (RE)a 74% high 74% high 62% high 0.43 POMSc  Anger 3.6 ± 0.6 4.9 ± 0.7 4.3 ± 0.6 0.03b  Confusion 6.1 ± 0.5 6.6 ± 0.5 6.0 ± 0.4 0.87  Depression 4.4 ± 0.9 5.2 ± 0.9 5.4 ± 0.8 0.83  Fatigue 7.4 ± 0.8 6.7 ± 0.9 6.9 ± 0.8 0.42  Tension 5.2 ± 0.5 6.7 ± 0.5 7.2 ± 0.7 0.22  Vigor 14.9 ± 0.9 16.8 ± 0.9 14.8 ± 0.9 0.83 Affective Lability Score  Bipolar 0.6 ± 0.1 0.7 ± 0.1 0.7 ± 0.1 0.42  Depression 0.9 ± 0.1 0.9 ± 0.1 1.0 ± 0.1 0.69  Elation 0.7 ± 0.1 0.8 ± 0.1 0.8 ± 0.1 0.89  Angerd 61% score >0 57% score >0 62% score >0 0.67  Anxietyd 67% score >0 83% score >0 84% score >0 0.21  Anxiety Depressiond 63% score >0 66% score >0 76% score >0 0.58 Measure Arm 1 Low-Normal TSH Arm 2 High-Normal TSH Arm 3 Mildly Elevated TSH P Billewicz  Billewicz Score 2.8 ± 0.4 4.0 ± 0.5 4.1 ± 0.4 0.38 Thyroid Disease Questionnaire   Thyroid Disease Questionnaire weighted average −1.6 ± 0.2 −1.4 ± 0.2 −1.9 ± 0.3 0.53 SF-36  Mental component summary 41.0 ± 0.9 39.8 ± 1.1 39.4 ± 1.0 0.66  Physical component summary 48.5 ± 0.7 50.6 ± 0.7 49.2 ± 0.8 0.28  General health 62.2 ± 1.6 64.3 ± 1.7 62.6 ± 2.1 0.86  Mental health 68.6 ± 1.6 66.0 ± 1.7 65.1 ± 1.7 0.12  Vitality 45.6 ± 2.7 50.5 ± 2.5 44.6 ± 2.7 0.36  Bodily pain (BP)a 21% high 23% high 33% high 0.69  Physical functioning (PF)a 26% high 49% high 31% high 0.03b  Role physical (RP)a 62% high 79% high 55% high 0.08  Social functioning (SF)a 62% high 53% high 52% high 0.62  Role emotional (RE)a 74% high 74% high 62% high 0.43 POMSc  Anger 3.6 ± 0.6 4.9 ± 0.7 4.3 ± 0.6 0.03b  Confusion 6.1 ± 0.5 6.6 ± 0.5 6.0 ± 0.4 0.87  Depression 4.4 ± 0.9 5.2 ± 0.9 5.4 ± 0.8 0.83  Fatigue 7.4 ± 0.8 6.7 ± 0.9 6.9 ± 0.8 0.42  Tension 5.2 ± 0.5 6.7 ± 0.5 7.2 ± 0.7 0.22  Vigor 14.9 ± 0.9 16.8 ± 0.9 14.8 ± 0.9 0.83 Affective Lability Score  Bipolar 0.6 ± 0.1 0.7 ± 0.1 0.7 ± 0.1 0.42  Depression 0.9 ± 0.1 0.9 ± 0.1 1.0 ± 0.1 0.69  Elation 0.7 ± 0.1 0.8 ± 0.1 0.8 ± 0.1 0.89  Angerd 61% score >0 57% score >0 62% score >0 0.67  Anxietyd 67% score >0 83% score >0 84% score >0 0.21  Anxiety Depressiond 63% score >0 66% score >0 76% score >0 0.58 Values are mean ± standard error of the mean. Significant differences between arms are shown in bold. P values for continuous outcomes were adjusted for age, years of education, WAIS-R score, BMI at baseline, change in BMI, sex and estrogen status, baseline TSH (low-normal vs high-normal), standard deviation of TSH values at all but first visit, baseline LT-4 dose, time on LT-4, time on LT-4 dose, and baseline value of outcome. P values for binary outcomes were adjusted for baseline TSH (low-normal vs high-normal) and baseline value of outcome. SF-36 and POMS outcomes were grouped, and multiple testing P value adjustments were made for these outcomes. a For BP, PF, RP, SF, and RE, the distributions within each group were highly skewed. We used the highest observed values of the skewed SF-36 subscales as the cutoff points for producing a dichotomous measure. The highest observed values were BP 75, PF 100, RP 50, SF 80, and RE 50. b Arms 1 and 2 were significantly different at the 5% level based on follow-up pairwise comparisons of all three arms based on Tukey multiple comparisons. However, these P values were not significant when we grouped outcomes from the same test together and applied multiple testing adjustments by using a Bonferroni correction to all individual Tukey adjusted P values comparing the three arms. c POMS values (except for the vigor subscale) were natural log-transformed before analysis because the raw data were skewed. All values were increased by 1 before the transformation because of the presence of zeros. d These scores were compared as the proportion positive between the groups, because the measures were skewed and contained a large proportion of zeros. View Large Table 3. Correlations Between Changes in Thyroid Hormone Levels and Health Status and Mood Measures at End of Study fT4 fT3 TSH Measure Coefficient P Coefficient P Coefficient P Billewicz  Billewicz Score −0.13 (−1.2 to 0.95) 0.82 0.07 (−0.04 to 0.18) 0.20 0.04 (−0.07 to 0.14) 0.49 Thyroid Disease Questionnaire  Thyroid Disease Questionnaire weighted average 0.18 (−0.26 to 0.62) 0.42 0.02 (−0.03 to 0.06) 0.41 −0.04 (−0.08 to 0.00) 0.07 SF-36  Mental component summary 0.22 (−1.96 to 2.41) 0.84 0.11 (−0.13 to 0.34) 0.37 −0.05 (−0.27 to 0.17) 0.66  Physical component summary 0.40 (−1.27 to 2.06) 0.64 0.03 (−0.15 to 0.20) 0.75 −0.01 (−0.18 to 0.15) 0.89  General health 0.24 (−3.35 to 3.84) 0.89 0.09 (−0.29 to 0.47) 0.65 0.04 (−0.32 to 0.41) 0.82  Mental health 1.97 (−1.36 to 5.30) 0.24 0.19 (−0.17 to 0.54) 0.30 −0.33 (−0.66 to 0.00) 0.05  Vitality 1.67 (−3.30 to 6.63) 0.51 0.20 (−0.34 to 0.74) 0.46 −0.14 (−0.65 to 0.36) 0.57  Bodily pain (BP)a −2% (−64% to 161%) 0.96 −8% (−20% to 3%) 0.17 3% (−6% to 13%) 0.53  Physical functioning (PF)a 72% (−35% to 378%) 0.28 3% (−7% to 16%) 0.54 −4% (−13% to 5%) 0.40  Role physical (RP)a 35% (−43% to 239%) 0.50 3% (−7% to 14%) 0.57 −2% (−10% to 6%) 0.61  Social functioning (SF)a 7% (−59% to 183%) 0.90 7% (−4% to 19%) 0.24 0% (−8% to 9%) 0.96  Role emotional (RE)a 93% (−26% to 451%) 0.19 7% (−3% to 19%) 0.20 −5% (−12% to 4%) 0.27 POMSb  Anger −0.25 (−0.55 to 0.06) 0.11 −0.0006 (−0.0319 to 0.0306) 0.97 0.02 (−0.01 to 0.05) 0.21  Confusion −0.04 (−0.20 to 0.13) 0.64 0.001 (−0.016 to 0.018) 0.89 0.004 (−0.012 to 0.021) 0.58  Depression −0.17 (−0.50 to 0.17) 0.32 −0.01 (−0.04 to 0.03) 0.72 0.01 (−0.02 to 0.04) 0.51  Fatigue −0.002 (−0.286 to 0.282) 0.99 −0.003 (−0.032 to 0.026) 0.84 0.002 (−0.025 to 0.03) 0.86  Tension −0.14 (−0.31 to 0.04) 0.13 −0.01 (−0.03 to 0.01) 0.26 0.005 (−0.012 to 0.022) 0.58  Vigor −0.54 (−2.54 to 1.45) 0.59 −0.09 (−0.29 to 0.11) 0.37 0.03 (−0.16 to 0.23) 0.73 Affective Lability Score  Bipolar 0.02 (−0.15 to 0.18) 0.85 −0.002 (−0.018 to 0.014) 0.80 −0.002 (−0.018 to 0.014) 0.79  Depression −0.05 (−0.22 to 0.12) 0.60 −0.01 (−0.02 to 0.01) 0.48 0.003 (−0.013 to 0.020) 0.71  Elation −0.02 (−0.18 to 0.14) 0.80 −0.005 (−0.021 to 0.011) 0.56 0.0004 (−0.0153 to 0.016) 0.96  Angerc 29% (−48% to 233%) 0.59 −1% (−10% to 9%) 0.83 4% (−4% to 13%) 0.41  Anxietyc 2% (−67% to 231%) 0.97 −4% (−15% to 8%) 0.49 5% (−5% to 16%) 0.37  Anxiety Depressionc −33% (−80% to 111%) 0.51 2% (−10% to 15%) 0.81 0% (−9% to 11%) >0.99 fT4 fT3 TSH Measure Coefficient P Coefficient P Coefficient P Billewicz  Billewicz Score −0.13 (−1.2 to 0.95) 0.82 0.07 (−0.04 to 0.18) 0.20 0.04 (−0.07 to 0.14) 0.49 Thyroid Disease Questionnaire  Thyroid Disease Questionnaire weighted average 0.18 (−0.26 to 0.62) 0.42 0.02 (−0.03 to 0.06) 0.41 −0.04 (−0.08 to 0.00) 0.07 SF-36  Mental component summary 0.22 (−1.96 to 2.41) 0.84 0.11 (−0.13 to 0.34) 0.37 −0.05 (−0.27 to 0.17) 0.66  Physical component summary 0.40 (−1.27 to 2.06) 0.64 0.03 (−0.15 to 0.20) 0.75 −0.01 (−0.18 to 0.15) 0.89  General health 0.24 (−3.35 to 3.84) 0.89 0.09 (−0.29 to 0.47) 0.65 0.04 (−0.32 to 0.41) 0.82  Mental health 1.97 (−1.36 to 5.30) 0.24 0.19 (−0.17 to 0.54) 0.30 −0.33 (−0.66 to 0.00) 0.05  Vitality 1.67 (−3.30 to 6.63) 0.51 0.20 (−0.34 to 0.74) 0.46 −0.14 (−0.65 to 0.36) 0.57  Bodily pain (BP)a −2% (−64% to 161%) 0.96 −8% (−20% to 3%) 0.17 3% (−6% to 13%) 0.53  Physical functioning (PF)a 72% (−35% to 378%) 0.28 3% (−7% to 16%) 0.54 −4% (−13% to 5%) 0.40  Role physical (RP)a 35% (−43% to 239%) 0.50 3% (−7% to 14%) 0.57 −2% (−10% to 6%) 0.61  Social functioning (SF)a 7% (−59% to 183%) 0.90 7% (−4% to 19%) 0.24 0% (−8% to 9%) 0.96  Role emotional (RE)a 93% (−26% to 451%) 0.19 7% (−3% to 19%) 0.20 −5% (−12% to 4%) 0.27 POMSb  Anger −0.25 (−0.55 to 0.06) 0.11 −0.0006 (−0.0319 to 0.0306) 0.97 0.02 (−0.01 to 0.05) 0.21  Confusion −0.04 (−0.20 to 0.13) 0.64 0.001 (−0.016 to 0.018) 0.89 0.004 (−0.012 to 0.021) 0.58  Depression −0.17 (−0.50 to 0.17) 0.32 −0.01 (−0.04 to 0.03) 0.72 0.01 (−0.02 to 0.04) 0.51  Fatigue −0.002 (−0.286 to 0.282) 0.99 −0.003 (−0.032 to 0.026) 0.84 0.002 (−0.025 to 0.03) 0.86  Tension −0.14 (−0.31 to 0.04) 0.13 −0.01 (−0.03 to 0.01) 0.26 0.005 (−0.012 to 0.022) 0.58  Vigor −0.54 (−2.54 to 1.45) 0.59 −0.09 (−0.29 to 0.11) 0.37 0.03 (−0.16 to 0.23) 0.73 Affective Lability Score  Bipolar 0.02 (−0.15 to 0.18) 0.85 −0.002 (−0.018 to 0.014) 0.80 −0.002 (−0.018 to 0.014) 0.79  Depression −0.05 (−0.22 to 0.12) 0.60 −0.01 (−0.02 to 0.01) 0.48 0.003 (−0.013 to 0.020) 0.71  Elation −0.02 (−0.18 to 0.14) 0.80 −0.005 (−0.021 to 0.011) 0.56 0.0004 (−0.0153 to 0.016) 0.96  Angerc 29% (−48% to 233%) 0.59 −1% (−10% to 9%) 0.83 4% (−4% to 13%) 0.41  Anxietyc 2% (−67% to 231%) 0.97 −4% (−15% to 8%) 0.49 5% (−5% to 16%) 0.37  Anxiety Depressionc −33% (−80% to 111%) 0.51 2% (−10% to 15%) 0.81 0% (−9% to 11%) >0.99 Correlations [95% confidence intervals (CIs)] with continuous outcomes were modeled with multiple linear regressions adjusting for age, years of education, WAIS-R score, BMI at baseline, change in BMI, sex and estrogen status, standard deviation of TSH values at all but first visit, baseline LT-4 dose, time on LT-4, time on LT-4 dose, baseline hormone value, and baseline value of outcome. The magnitude of the coefficient indicates the estimated change in the outcome for each 1-unit increase in the study change (end of study – baseline) of fT4 or TSH and a 10-unit increase in the study change of fT3. Correlations (95% CIs) with binary outcomes were modeled with multiple logistic regressions adjusting for the baseline value of the hormone and the outcome of interest. Coefficients were transformed to estimate the percentage change in the predicted odds of the measure for a 1-unit increase in the study change (end of study – baseline) of fT4 or TSH and a 10-unit increase in the study change of fT3. The transformed coefficients are estimates of the risk ratios associated with the 1- or 10-unit increase in the study change of respective hormone level. A positive coefficient indicates that the measure increased with increasing hormone levels, whereas a negative coefficient indicates that the measure decreased with increasing hormone levels. Separate models were run for each hormone. Significant coefficients are shown in bold with corresponding P values. a For these variables, the distributions within each group were highly skewed. We used the highest observed values of the skewed SF-36 subscales as the cutoff points for producing a dichotomous measure. The highest observed values were BP 75, PF 100, RP 50, SF-80, and RE 50. b POMS values (except for the vigor subscale) were natural log-transformed before analysis because the raw data were skewed. All values were increased by 1 before the transformation because of the presence of zeros. The magnitude of the coefficient indicates the estimated change in the natural log of the measure plus 1 with a 1-unit (10 units for fT3) increase. c These scores were compared as the proportion positive between the groups, because the measures were skewed and contained a large proportion of zeros. View Large Table 3. Correlations Between Changes in Thyroid Hormone Levels and Health Status and Mood Measures at End of Study fT4 fT3 TSH Measure Coefficient P Coefficient P Coefficient P Billewicz  Billewicz Score −0.13 (−1.2 to 0.95) 0.82 0.07 (−0.04 to 0.18) 0.20 0.04 (−0.07 to 0.14) 0.49 Thyroid Disease Questionnaire  Thyroid Disease Questionnaire weighted average 0.18 (−0.26 to 0.62) 0.42 0.02 (−0.03 to 0.06) 0.41 −0.04 (−0.08 to 0.00) 0.07 SF-36  Mental component summary 0.22 (−1.96 to 2.41) 0.84 0.11 (−0.13 to 0.34) 0.37 −0.05 (−0.27 to 0.17) 0.66  Physical component summary 0.40 (−1.27 to 2.06) 0.64 0.03 (−0.15 to 0.20) 0.75 −0.01 (−0.18 to 0.15) 0.89  General health 0.24 (−3.35 to 3.84) 0.89 0.09 (−0.29 to 0.47) 0.65 0.04 (−0.32 to 0.41) 0.82  Mental health 1.97 (−1.36 to 5.30) 0.24 0.19 (−0.17 to 0.54) 0.30 −0.33 (−0.66 to 0.00) 0.05  Vitality 1.67 (−3.30 to 6.63) 0.51 0.20 (−0.34 to 0.74) 0.46 −0.14 (−0.65 to 0.36) 0.57  Bodily pain (BP)a −2% (−64% to 161%) 0.96 −8% (−20% to 3%) 0.17 3% (−6% to 13%) 0.53  Physical functioning (PF)a 72% (−35% to 378%) 0.28 3% (−7% to 16%) 0.54 −4% (−13% to 5%) 0.40  Role physical (RP)a 35% (−43% to 239%) 0.50 3% (−7% to 14%) 0.57 −2% (−10% to 6%) 0.61  Social functioning (SF)a 7% (−59% to 183%) 0.90 7% (−4% to 19%) 0.24 0% (−8% to 9%) 0.96  Role emotional (RE)a 93% (−26% to 451%) 0.19 7% (−3% to 19%) 0.20 −5% (−12% to 4%) 0.27 POMSb  Anger −0.25 (−0.55 to 0.06) 0.11 −0.0006 (−0.0319 to 0.0306) 0.97 0.02 (−0.01 to 0.05) 0.21  Confusion −0.04 (−0.20 to 0.13) 0.64 0.001 (−0.016 to 0.018) 0.89 0.004 (−0.012 to 0.021) 0.58  Depression −0.17 (−0.50 to 0.17) 0.32 −0.01 (−0.04 to 0.03) 0.72 0.01 (−0.02 to 0.04) 0.51  Fatigue −0.002 (−0.286 to 0.282) 0.99 −0.003 (−0.032 to 0.026) 0.84 0.002 (−0.025 to 0.03) 0.86  Tension −0.14 (−0.31 to 0.04) 0.13 −0.01 (−0.03 to 0.01) 0.26 0.005 (−0.012 to 0.022) 0.58  Vigor −0.54 (−2.54 to 1.45) 0.59 −0.09 (−0.29 to 0.11) 0.37 0.03 (−0.16 to 0.23) 0.73 Affective Lability Score  Bipolar 0.02 (−0.15 to 0.18) 0.85 −0.002 (−0.018 to 0.014) 0.80 −0.002 (−0.018 to 0.014) 0.79  Depression −0.05 (−0.22 to 0.12) 0.60 −0.01 (−0.02 to 0.01) 0.48 0.003 (−0.013 to 0.020) 0.71  Elation −0.02 (−0.18 to 0.14) 0.80 −0.005 (−0.021 to 0.011) 0.56 0.0004 (−0.0153 to 0.016) 0.96  Angerc 29% (−48% to 233%) 0.59 −1% (−10% to 9%) 0.83 4% (−4% to 13%) 0.41  Anxietyc 2% (−67% to 231%) 0.97 −4% (−15% to 8%) 0.49 5% (−5% to 16%) 0.37  Anxiety Depressionc −33% (−80% to 111%) 0.51 2% (−10% to 15%) 0.81 0% (−9% to 11%) >0.99 fT4 fT3 TSH Measure Coefficient P Coefficient P Coefficient P Billewicz  Billewicz Score −0.13 (−1.2 to 0.95) 0.82 0.07 (−0.04 to 0.18) 0.20 0.04 (−0.07 to 0.14) 0.49 Thyroid Disease Questionnaire  Thyroid Disease Questionnaire weighted average 0.18 (−0.26 to 0.62) 0.42 0.02 (−0.03 to 0.06) 0.41 −0.04 (−0.08 to 0.00) 0.07 SF-36  Mental component summary 0.22 (−1.96 to 2.41) 0.84 0.11 (−0.13 to 0.34) 0.37 −0.05 (−0.27 to 0.17) 0.66  Physical component summary 0.40 (−1.27 to 2.06) 0.64 0.03 (−0.15 to 0.20) 0.75 −0.01 (−0.18 to 0.15) 0.89  General health 0.24 (−3.35 to 3.84) 0.89 0.09 (−0.29 to 0.47) 0.65 0.04 (−0.32 to 0.41) 0.82  Mental health 1.97 (−1.36 to 5.30) 0.24 0.19 (−0.17 to 0.54) 0.30 −0.33 (−0.66 to 0.00) 0.05  Vitality 1.67 (−3.30 to 6.63) 0.51 0.20 (−0.34 to 0.74) 0.46 −0.14 (−0.65 to 0.36) 0.57  Bodily pain (BP)a −2% (−64% to 161%) 0.96 −8% (−20% to 3%) 0.17 3% (−6% to 13%) 0.53  Physical functioning (PF)a 72% (−35% to 378%) 0.28 3% (−7% to 16%) 0.54 −4% (−13% to 5%) 0.40  Role physical (RP)a 35% (−43% to 239%) 0.50 3% (−7% to 14%) 0.57 −2% (−10% to 6%) 0.61  Social functioning (SF)a 7% (−59% to 183%) 0.90 7% (−4% to 19%) 0.24 0% (−8% to 9%) 0.96  Role emotional (RE)a 93% (−26% to 451%) 0.19 7% (−3% to 19%) 0.20 −5% (−12% to 4%) 0.27 POMSb  Anger −0.25 (−0.55 to 0.06) 0.11 −0.0006 (−0.0319 to 0.0306) 0.97 0.02 (−0.01 to 0.05) 0.21  Confusion −0.04 (−0.20 to 0.13) 0.64 0.001 (−0.016 to 0.018) 0.89 0.004 (−0.012 to 0.021) 0.58  Depression −0.17 (−0.50 to 0.17) 0.32 −0.01 (−0.04 to 0.03) 0.72 0.01 (−0.02 to 0.04) 0.51  Fatigue −0.002 (−0.286 to 0.282) 0.99 −0.003 (−0.032 to 0.026) 0.84 0.002 (−0.025 to 0.03) 0.86  Tension −0.14 (−0.31 to 0.04) 0.13 −0.01 (−0.03 to 0.01) 0.26 0.005 (−0.012 to 0.022) 0.58  Vigor −0.54 (−2.54 to 1.45) 0.59 −0.09 (−0.29 to 0.11) 0.37 0.03 (−0.16 to 0.23) 0.73 Affective Lability Score  Bipolar 0.02 (−0.15 to 0.18) 0.85 −0.002 (−0.018 to 0.014) 0.80 −0.002 (−0.018 to 0.014) 0.79  Depression −0.05 (−0.22 to 0.12) 0.60 −0.01 (−0.02 to 0.01) 0.48 0.003 (−0.013 to 0.020) 0.71  Elation −0.02 (−0.18 to 0.14) 0.80 −0.005 (−0.021 to 0.011) 0.56 0.0004 (−0.0153 to 0.016) 0.96  Angerc 29% (−48% to 233%) 0.59 −1% (−10% to 9%) 0.83 4% (−4% to 13%) 0.41  Anxietyc 2% (−67% to 231%) 0.97 −4% (−15% to 8%) 0.49 5% (−5% to 16%) 0.37  Anxiety Depressionc −33% (−80% to 111%) 0.51 2% (−10% to 15%) 0.81 0% (−9% to 11%) >0.99 Correlations [95% confidence intervals (CIs)] with continuous outcomes were modeled with multiple linear regressions adjusting for age, years of education, WAIS-R score, BMI at baseline, change in BMI, sex and estrogen status, standard deviation of TSH values at all but first visit, baseline LT-4 dose, time on LT-4, time on LT-4 dose, baseline hormone value, and baseline value of outcome. The magnitude of the coefficient indicates the estimated change in the outcome for each 1-unit increase in the study change (end of study – baseline) of fT4 or TSH and a 10-unit increase in the study change of fT3. Correlations (95% CIs) with binary outcomes were modeled with multiple logistic regressions adjusting for the baseline value of the hormone and the outcome of interest. Coefficients were transformed to estimate the percentage change in the predicted odds of the measure for a 1-unit increase in the study change (end of study – baseline) of fT4 or TSH and a 10-unit increase in the study change of fT3. The transformed coefficients are estimates of the risk ratios associated with the 1- or 10-unit increase in the study change of respective hormone level. A positive coefficient indicates that the measure increased with increasing hormone levels, whereas a negative coefficient indicates that the measure decreased with increasing hormone levels. Separate models were run for each hormone. Significant coefficients are shown in bold with corresponding P values. a For these variables, the distributions within each group were highly skewed. We used the highest observed values of the skewed SF-36 subscales as the cutoff points for producing a dichotomous measure. The highest observed values were BP 75, PF 100, RP 50, SF-80, and RE 50. b POMS values (except for the vigor subscale) were natural log-transformed before analysis because the raw data were skewed. All values were increased by 1 before the transformation because of the presence of zeros. The magnitude of the coefficient indicates the estimated change in the natural log of the measure plus 1 with a 1-unit (10 units for fT3) increase. c These scores were compared as the proportion positive between the groups, because the measures were skewed and contained a large proportion of zeros. View Large Cognitive tests by intention to treat At the end of the study, the Letter Cancellation Test percentage with no errors and 1-back number correct on target were worse in the mildly elevated TSH compared with the low-normal TSH arm (11% vs 30%, P = 0.02; 68% vs 84%, P = 0.02), but these differences were not significant after correction for multiple testing. There were no other differences between the three arms in cognitive outcomes (Table 4). With TSH, fT4, and fT3 analyzed as continuous variables, there were a few correlations between fT4 or fT3 and individual outcomes, but only one remained significant after correction for multiple comparisons (Pursuit Rotor Trial 3 time on target inversely related to fT4 levels) (Table 5). Table 4. End-of-Study Cognitive Measures for Each Arm Analyzed by Intention to Treat Test Arm 1 Low-Normal TSH Arm 2 High-Normal TSH Arm 3 Mildly Elevated TSH P Executive function Letter Cancellation Test  Time, s 102.0 ± 3.3 102.6 ± 3.1 100.4 ± 3.0 0.64  % With no errors 30% 26% 11% 0.02a Trail Making Test  Time, (s 23.3 ± 1.0 23.2 ± 1.0 21.9 ± 0.8 0.26  ABC time, s 56.7 ± 2.8 59.9 ± 3.1 54.9 ± 2.9 0.70  % With errors 11% 9% 11% 0.91  % With ABC errors 28% 26% 22% 0.76 Iowa Gambling Test  Net-1 0.7 ± 1.6 −0.7 ± 1.1 −2.4 ± 1.1 0.48  Net-2 5.4 ± 1.3 4.9 ± 1.3 4.0 ± 1.3 0.73  Net-3 7.2 ± 1.6 8.2 ± 1.3 6.8 ± 1.5 0.60  Net-4 9.5 ± 1.6 7.5 ± 1.4 9.1 ± 1.5 0.58  Net-5 7.9 ± 1.6 7.5 ± 1.5 8.8 ± 1.6 0.84 Working memory N-Back number correct on target  1-Backb 84% 75% 68% 0.02a  2-Backb 65% 65% 70% 0.81  3-Back 11.7 ± 0.3 11.5 ± 0.4 11.5 ± 0.4 0.75 N-Back number incorrect nontarget  1-Backb 14% 18% 20% 0.43  2-Backb 23% 25% 34% 0.31  3-Back 3.0 ± 0.2 3.5 ± 0.3 3.1 ± 0.3 0.38 Subject-Ordered Pointing errors  6 0.4 ± 0.1 0.5 ± 0.1 0.4 ± 0.1 0.68  8 0.9 ± 0.1 0.8 ± 0.1 0.9 ± 0.1 0.73  10 1.1 ± 0.1 1.1 ± 0.1 1.1 ± 0.1 0.60  12 1.5 ± 0.2 1.5 ± 0.2 1.5 ± 0.2 0.80 Declarative memory Paragraph Recall  Immediate 13.3 ± 0.4 13.5 ± 0.5 14.0 ± 0.4 0.70  30-Min delay 11.9 ± 0.4 12.1 ± 0.5 12.8 ± 0.4 0.68 Motor learning Pursuit Rotor Trial  Time on target, s   1 38.7 ± 2.1 34.8 ± 2.0 41.5 ± 2.2 0.51   2 39.6 ± 2.3 37.0 ± 2.0 43.8 ± 2.1 0.58   3 39.9 ± 2.3 37.4 ± 1.9 44.0 ± 2.2 0.44   4 41.8 ± 2.3 39.4 ± 1.9 45.0 ± 2.2 0.68 Motor Sequence Learning Test  Total movement time, s 1114.4 ± 43.7 1125.3 ± 36.8 1019.6 ± 35.9 0.46 Test Arm 1 Low-Normal TSH Arm 2 High-Normal TSH Arm 3 Mildly Elevated TSH P Executive function Letter Cancellation Test  Time, s 102.0 ± 3.3 102.6 ± 3.1 100.4 ± 3.0 0.64  % With no errors 30% 26% 11% 0.02a Trail Making Test  Time, (s 23.3 ± 1.0 23.2 ± 1.0 21.9 ± 0.8 0.26  ABC time, s 56.7 ± 2.8 59.9 ± 3.1 54.9 ± 2.9 0.70  % With errors 11% 9% 11% 0.91  % With ABC errors 28% 26% 22% 0.76 Iowa Gambling Test  Net-1 0.7 ± 1.6 −0.7 ± 1.1 −2.4 ± 1.1 0.48  Net-2 5.4 ± 1.3 4.9 ± 1.3 4.0 ± 1.3 0.73  Net-3 7.2 ± 1.6 8.2 ± 1.3 6.8 ± 1.5 0.60  Net-4 9.5 ± 1.6 7.5 ± 1.4 9.1 ± 1.5 0.58  Net-5 7.9 ± 1.6 7.5 ± 1.5 8.8 ± 1.6 0.84 Working memory N-Back number correct on target  1-Backb 84% 75% 68% 0.02a  2-Backb 65% 65% 70% 0.81  3-Back 11.7 ± 0.3 11.5 ± 0.4 11.5 ± 0.4 0.75 N-Back number incorrect nontarget  1-Backb 14% 18% 20% 0.43  2-Backb 23% 25% 34% 0.31  3-Back 3.0 ± 0.2 3.5 ± 0.3 3.1 ± 0.3 0.38 Subject-Ordered Pointing errors  6 0.4 ± 0.1 0.5 ± 0.1 0.4 ± 0.1 0.68  8 0.9 ± 0.1 0.8 ± 0.1 0.9 ± 0.1 0.73  10 1.1 ± 0.1 1.1 ± 0.1 1.1 ± 0.1 0.60  12 1.5 ± 0.2 1.5 ± 0.2 1.5 ± 0.2 0.80 Declarative memory Paragraph Recall  Immediate 13.3 ± 0.4 13.5 ± 0.5 14.0 ± 0.4 0.70  30-Min delay 11.9 ± 0.4 12.1 ± 0.5 12.8 ± 0.4 0.68 Motor learning Pursuit Rotor Trial  Time on target, s   1 38.7 ± 2.1 34.8 ± 2.0 41.5 ± 2.2 0.51   2 39.6 ± 2.3 37.0 ± 2.0 43.8 ± 2.1 0.58   3 39.9 ± 2.3 37.4 ± 1.9 44.0 ± 2.2 0.44   4 41.8 ± 2.3 39.4 ± 1.9 45.0 ± 2.2 0.68 Motor Sequence Learning Test  Total movement time, s 1114.4 ± 43.7 1125.3 ± 36.8 1019.6 ± 35.9 0.46 Values are mean ± standard error of the mean. Significant differences between arms are shown in bold. Individual tests are grouped by cognitive subdomains (first column). P values for continuous outcomes were adjusted for age, years of education, WAIS-R score, BMI at baseline, change in BMI, sex and estrogen status, baseline TSH (low-normal vs high-normal), standard deviation of TSH values at interim and last visits, baseline LT-4 dose, time on LT-4, time on LT-4 dose, and baseline value of outcome. P values for binary outcomes were adjusted for baseline TSH (low-normal vs high-normal) and baseline value of outcome. Letter Cancellation and N-Back outcomes were grouped, and multiple testing P value adjustments were made for these outcomes. a Arms 1 and 3 were significantly different at the 5% level based on follow-up pairwise comparisons of all three arms based on Tukey multiple comparisons. However, these P values were not significant when we grouped outcomes from the same test together and applied multiple testing adjustments by using Bonferroni correction to all individual Tukey adjusted P values comparing the three arms. b These values were calculated as proportion of subjects for whom each measure was ≥15 (for correct on target) or >0 (for incorrect nontarget), because there were floor effects. View Large Table 4. End-of-Study Cognitive Measures for Each Arm Analyzed by Intention to Treat Test Arm 1 Low-Normal TSH Arm 2 High-Normal TSH Arm 3 Mildly Elevated TSH P Executive function Letter Cancellation Test  Time, s 102.0 ± 3.3 102.6 ± 3.1 100.4 ± 3.0 0.64  % With no errors 30% 26% 11% 0.02a Trail Making Test  Time, (s 23.3 ± 1.0 23.2 ± 1.0 21.9 ± 0.8 0.26  ABC time, s 56.7 ± 2.8 59.9 ± 3.1 54.9 ± 2.9 0.70  % With errors 11% 9% 11% 0.91  % With ABC errors 28% 26% 22% 0.76 Iowa Gambling Test  Net-1 0.7 ± 1.6 −0.7 ± 1.1 −2.4 ± 1.1 0.48  Net-2 5.4 ± 1.3 4.9 ± 1.3 4.0 ± 1.3 0.73  Net-3 7.2 ± 1.6 8.2 ± 1.3 6.8 ± 1.5 0.60  Net-4 9.5 ± 1.6 7.5 ± 1.4 9.1 ± 1.5 0.58  Net-5 7.9 ± 1.6 7.5 ± 1.5 8.8 ± 1.6 0.84 Working memory N-Back number correct on target  1-Backb 84% 75% 68% 0.02a  2-Backb 65% 65% 70% 0.81  3-Back 11.7 ± 0.3 11.5 ± 0.4 11.5 ± 0.4 0.75 N-Back number incorrect nontarget  1-Backb 14% 18% 20% 0.43  2-Backb 23% 25% 34% 0.31  3-Back 3.0 ± 0.2 3.5 ± 0.3 3.1 ± 0.3 0.38 Subject-Ordered Pointing errors  6 0.4 ± 0.1 0.5 ± 0.1 0.4 ± 0.1 0.68  8 0.9 ± 0.1 0.8 ± 0.1 0.9 ± 0.1 0.73  10 1.1 ± 0.1 1.1 ± 0.1 1.1 ± 0.1 0.60  12 1.5 ± 0.2 1.5 ± 0.2 1.5 ± 0.2 0.80 Declarative memory Paragraph Recall  Immediate 13.3 ± 0.4 13.5 ± 0.5 14.0 ± 0.4 0.70  30-Min delay 11.9 ± 0.4 12.1 ± 0.5 12.8 ± 0.4 0.68 Motor learning Pursuit Rotor Trial  Time on target, s   1 38.7 ± 2.1 34.8 ± 2.0 41.5 ± 2.2 0.51   2 39.6 ± 2.3 37.0 ± 2.0 43.8 ± 2.1 0.58   3 39.9 ± 2.3 37.4 ± 1.9 44.0 ± 2.2 0.44   4 41.8 ± 2.3 39.4 ± 1.9 45.0 ± 2.2 0.68 Motor Sequence Learning Test  Total movement time, s 1114.4 ± 43.7 1125.3 ± 36.8 1019.6 ± 35.9 0.46 Test Arm 1 Low-Normal TSH Arm 2 High-Normal TSH Arm 3 Mildly Elevated TSH P Executive function Letter Cancellation Test  Time, s 102.0 ± 3.3 102.6 ± 3.1 100.4 ± 3.0 0.64  % With no errors 30% 26% 11% 0.02a Trail Making Test  Time, (s 23.3 ± 1.0 23.2 ± 1.0 21.9 ± 0.8 0.26  ABC time, s 56.7 ± 2.8 59.9 ± 3.1 54.9 ± 2.9 0.70  % With errors 11% 9% 11% 0.91  % With ABC errors 28% 26% 22% 0.76 Iowa Gambling Test  Net-1 0.7 ± 1.6 −0.7 ± 1.1 −2.4 ± 1.1 0.48  Net-2 5.4 ± 1.3 4.9 ± 1.3 4.0 ± 1.3 0.73  Net-3 7.2 ± 1.6 8.2 ± 1.3 6.8 ± 1.5 0.60  Net-4 9.5 ± 1.6 7.5 ± 1.4 9.1 ± 1.5 0.58  Net-5 7.9 ± 1.6 7.5 ± 1.5 8.8 ± 1.6 0.84 Working memory N-Back number correct on target  1-Backb 84% 75% 68% 0.02a  2-Backb 65% 65% 70% 0.81  3-Back 11.7 ± 0.3 11.5 ± 0.4 11.5 ± 0.4 0.75 N-Back number incorrect nontarget  1-Backb 14% 18% 20% 0.43  2-Backb 23% 25% 34% 0.31  3-Back 3.0 ± 0.2 3.5 ± 0.3 3.1 ± 0.3 0.38 Subject-Ordered Pointing errors  6 0.4 ± 0.1 0.5 ± 0.1 0.4 ± 0.1 0.68  8 0.9 ± 0.1 0.8 ± 0.1 0.9 ± 0.1 0.73  10 1.1 ± 0.1 1.1 ± 0.1 1.1 ± 0.1 0.60  12 1.5 ± 0.2 1.5 ± 0.2 1.5 ± 0.2 0.80 Declarative memory Paragraph Recall  Immediate 13.3 ± 0.4 13.5 ± 0.5 14.0 ± 0.4 0.70  30-Min delay 11.9 ± 0.4 12.1 ± 0.5 12.8 ± 0.4 0.68 Motor learning Pursuit Rotor Trial  Time on target, s   1 38.7 ± 2.1 34.8 ± 2.0 41.5 ± 2.2 0.51   2 39.6 ± 2.3 37.0 ± 2.0 43.8 ± 2.1 0.58   3 39.9 ± 2.3 37.4 ± 1.9 44.0 ± 2.2 0.44   4 41.8 ± 2.3 39.4 ± 1.9 45.0 ± 2.2 0.68 Motor Sequence Learning Test  Total movement time, s 1114.4 ± 43.7 1125.3 ± 36.8 1019.6 ± 35.9 0.46 Values are mean ± standard error of the mean. Significant differences between arms are shown in bold. Individual tests are grouped by cognitive subdomains (first column). P values for continuous outcomes were adjusted for age, years of education, WAIS-R score, BMI at baseline, change in BMI, sex and estrogen status, baseline TSH (low-normal vs high-normal), standard deviation of TSH values at interim and last visits, baseline LT-4 dose, time on LT-4, time on LT-4 dose, and baseline value of outcome. P values for binary outcomes were adjusted for baseline TSH (low-normal vs high-normal) and baseline value of outcome. Letter Cancellation and N-Back outcomes were grouped, and multiple testing P value adjustments were made for these outcomes. a Arms 1 and 3 were significantly different at the 5% level based on follow-up pairwise comparisons of all three arms based on Tukey multiple comparisons. However, these P values were not significant when we grouped outcomes from the same test together and applied multiple testing adjustments by using Bonferroni correction to all individual Tukey adjusted P values comparing the three arms. b These values were calculated as proportion of subjects for whom each measure was ≥15 (for correct on target) or >0 (for incorrect nontarget), because there were floor effects. View Large Table 5. Correlations Between Changes in Thyroid Hormone Levels and Cognitive Measures at End of Study fT4 fT3 TSH Test Coefficient P Coefficient P Coefficient P Executive function Letter Cancellation Test  Time, s 1.85 (−5.37 to 9.07) 0.61 0.60 (−0.14 to 1.33) 0.11 −0.09 (−0.80 to 0.62) 0.80  % With no errors 13% (−57% to 182%) 0.80 −3% (−13% to 7%) 0.56 −9% (−20% to 2%) 0.13 Trail Making Test  Time, s 1.04 (−1.04 to 3.13) 0.32 −0.10 (−0.31 to 0.11) 0.34 −0.03 (−0.24 to 0.17) 0.75  ABC time, s −4.14 (−10.33 to 2.05) 0.19 0.12 (−0.52 to 0.75) 0.72 −0.04 (−0.65 to 0.57) 0.89  % With errors −23% (−82% to 177%) 0.70 −5% (−18% to 8%) 0.45 3% (−10% to 14%) 0.66  % With ABC errors −11% (−66% to 121%) 0.81 −8% (−18% to 1%) 0.10 −5% (−14% to 4%) 0.29 Iowa Gambling Test  Net-1 −0.24 (−3.91 to 3.43) 0.90 −0.40 (−0.77 to −0.03) 0.03a 0.09 (−0.27 to 0.46) 0.61  Net-2 −1.73 (−5.31 to 1.84) 0.34 −0.23 (−0.60 to 0.14) 0.21 −0.01 (−0.36 to 0.34) 0.97  Net-3 0.49 (−3.09 to 4.08) 0.79 0.09 (−0.28 to 0.45) 0.65 0.10 (−0.25 to 0.45) 0.58  Net-4 −1.88 (−5.72 to 1.97) 0.34 -0.51 (−0.89 to −0.12) 0.01a 0.001 (−0.374 to 0.377) >0.99  Net-5 −0.39 (−4.25 to 3.48) 0.84 −0.21 (−0.61 to 0.19) 0.30 0.07 (−0.31 to 0.44) 0.73 Working memory N-Back number correct  on target  1-Backb 252% (18% to 1118%) 0.03a 9% (−2% to 23%) 0.14 −7% (−14% to 2%) 0.11  2-Backb 1% (−60% to 159%) >0.99 7% (−3% to 20%) 0.20 −2% (−10% to 8%) 0.70  3-Back 0.17 (−0.76 to 1.10) 0.71 −0.04 (−0.14 to 0.05) 0.36 0.05 (−0.04 to 0.14) 0.31 N-Back number  incorrect nontarget  1-Backb −22% (−77% to 136%) 0.67 0% (−11% to 11%) 0.95 3% (−7% to 14%) 0.51  2-Backb 27% (−52% to 236%) 0.63 3% (−8% to 14%) 0.63 5% (−4% to 14%) 0.29  3-Back −0.28 (−1.07 to 0.51) 0.48 0.05 (−0.04 to 0.13) 0.27 −0.01 (−0.09 to 0.06) 0.74 Subject-Ordered  Pointing errors  6 0.02 (−0.14 to 0.18) 0.80 −0.01 (−0.02 to 0.01) 0.52 0.01 (−0.01 to 0.02) 0.27  8 0.11 (−0.16 to 0.38) 0.42 0.0003 (−0.0280 to 0.0286) 0.98 0.002 (−0.024 to 0.029) 0.88  10 0.10 (−0.14 to 0.35) 0.42 −0.003 (−0.029 to 0.023) 0.82 0.02 (−0.01 to 0.04) 0.18  12 0.05 (−0.28 to 0.39) 0.75 −0.02 (−0.05 to 0.02) 0.27 −0.003 (−0.036 to 0.029) 0.83 Declarative memory Paragraph Recall  Immediate −0.33 (−1.33 to 0.67) 0.52 −0.02 (−0.12 to 0.08) 0.71 0.04 (−0.05 to 0.14) 0.38  30-min delay −0.51 (−1.53 to 0.51) 0.32 0.01 (−0.10 to 0.11) 0.86 0.08 (−0.02 to 0.18) 0.11 Motor learning Pursuit Rotor Trial   Time on target, s  1 −5.13 (−9.77 to −0.48) 0.03a 0.10 (−0.40 to 0.59) 0.70 0.18 (−0.30 to 0.65) 0.47   2 −5.78 (−10.62 to −0.93) 0.02a 0.16 (−0.35 to 0.68) 0.53 0.22 (−0.29 to 0.72) 0.40   3 −6.70 (−11.25 to −2.14) 0.004c 0.09 (−0.40 to 0.58) 0.72 0.30 (−0.17 to 0.77) 0.21   4 −2.19 (−7.09 to 2.71) 0.38 0.10 (−0.41 to 0.61) 0.70 0.15 (−0.34 to 0.65) 0.54 Motor Sequence  Learning Test  Total movement  time, s 31.00 (−14.82 to 76.82) 0.18 2.31 (−2.45 to 7.07) 0.34 −1.08 (−5.74 to 3.58) 0.65 fT4 fT3 TSH Test Coefficient P Coefficient P Coefficient P Executive function Letter Cancellation Test  Time, s 1.85 (−5.37 to 9.07) 0.61 0.60 (−0.14 to 1.33) 0.11 −0.09 (−0.80 to 0.62) 0.80  % With no errors 13% (−57% to 182%) 0.80 −3% (−13% to 7%) 0.56 −9% (−20% to 2%) 0.13 Trail Making Test  Time, s 1.04 (−1.04 to 3.13) 0.32 −0.10 (−0.31 to 0.11) 0.34 −0.03 (−0.24 to 0.17) 0.75  ABC time, s −4.14 (−10.33 to 2.05) 0.19 0.12 (−0.52 to 0.75) 0.72 −0.04 (−0.65 to 0.57) 0.89  % With errors −23% (−82% to 177%) 0.70 −5% (−18% to 8%) 0.45 3% (−10% to 14%) 0.66  % With ABC errors −11% (−66% to 121%) 0.81 −8% (−18% to 1%) 0.10 −5% (−14% to 4%) 0.29 Iowa Gambling Test  Net-1 −0.24 (−3.91 to 3.43) 0.90 −0.40 (−0.77 to −0.03) 0.03a 0.09 (−0.27 to 0.46) 0.61  Net-2 −1.73 (−5.31 to 1.84) 0.34 −0.23 (−0.60 to 0.14) 0.21 −0.01 (−0.36 to 0.34) 0.97  Net-3 0.49 (−3.09 to 4.08) 0.79 0.09 (−0.28 to 0.45) 0.65 0.10 (−0.25 to 0.45) 0.58  Net-4 −1.88 (−5.72 to 1.97) 0.34 -0.51 (−0.89 to −0.12) 0.01a 0.001 (−0.374 to 0.377) >0.99  Net-5 −0.39 (−4.25 to 3.48) 0.84 −0.21 (−0.61 to 0.19) 0.30 0.07 (−0.31 to 0.44) 0.73 Working memory N-Back number correct  on target  1-Backb 252% (18% to 1118%) 0.03a 9% (−2% to 23%) 0.14 −7% (−14% to 2%) 0.11  2-Backb 1% (−60% to 159%) >0.99 7% (−3% to 20%) 0.20 −2% (−10% to 8%) 0.70  3-Back 0.17 (−0.76 to 1.10) 0.71 −0.04 (−0.14 to 0.05) 0.36 0.05 (−0.04 to 0.14) 0.31 N-Back number  incorrect nontarget  1-Backb −22% (−77% to 136%) 0.67 0% (−11% to 11%) 0.95 3% (−7% to 14%) 0.51  2-Backb 27% (−52% to 236%) 0.63 3% (−8% to 14%) 0.63 5% (−4% to 14%) 0.29  3-Back −0.28 (−1.07 to 0.51) 0.48 0.05 (−0.04 to 0.13) 0.27 −0.01 (−0.09 to 0.06) 0.74 Subject-Ordered  Pointing errors  6 0.02 (−0.14 to 0.18) 0.80 −0.01 (−0.02 to 0.01) 0.52 0.01 (−0.01 to 0.02) 0.27  8 0.11 (−0.16 to 0.38) 0.42 0.0003 (−0.0280 to 0.0286) 0.98 0.002 (−0.024 to 0.029) 0.88  10 0.10 (−0.14 to 0.35) 0.42 −0.003 (−0.029 to 0.023) 0.82 0.02 (−0.01 to 0.04) 0.18  12 0.05 (−0.28 to 0.39) 0.75 −0.02 (−0.05 to 0.02) 0.27 −0.003 (−0.036 to 0.029) 0.83 Declarative memory Paragraph Recall  Immediate −0.33 (−1.33 to 0.67) 0.52 −0.02 (−0.12 to 0.08) 0.71 0.04 (−0.05 to 0.14) 0.38  30-min delay −0.51 (−1.53 to 0.51) 0.32 0.01 (−0.10 to 0.11) 0.86 0.08 (−0.02 to 0.18) 0.11 Motor learning Pursuit Rotor Trial   Time on target, s  1 −5.13 (−9.77 to −0.48) 0.03a 0.10 (−0.40 to 0.59) 0.70 0.18 (−0.30 to 0.65) 0.47   2 −5.78 (−10.62 to −0.93) 0.02a 0.16 (−0.35 to 0.68) 0.53 0.22 (−0.29 to 0.72) 0.40   3 −6.70 (−11.25 to −2.14) 0.004c 0.09 (−0.40 to 0.58) 0.72 0.30 (−0.17 to 0.77) 0.21   4 −2.19 (−7.09 to 2.71) 0.38 0.10 (−0.41 to 0.61) 0.70 0.15 (−0.34 to 0.65) 0.54 Motor Sequence  Learning Test  Total movement  time, s 31.00 (−14.82 to 76.82) 0.18 2.31 (−2.45 to 7.07) 0.34 −1.08 (−5.74 to 3.58) 0.65 Correlations [95% confidence intervals (CIs)] with continuous outcomes were modeled with multiple linear regressions adjusting for age, years of education, WAIS-R score, BMI at baseline, change in BMI, sex and estrogen status, standard deviation of TSH values at all but first visit, baseline LT-4 dose, time on LT-4, time on LT-4 dose, baseline hormone value, and baseline value of outcome. The magnitude of the coefficient indicates the estimated change in the outcome for each 1-unit increase in the study change (end of study – baseline) of fT4 or TSH and a 10-unit increase in the study change of fT3. Correlations (95% CIs) with binary outcomes were modeled with multiple logistic regressions adjusting for baseline values of the hormone and the outcome of interest. Coefficients were transformed to estimate the percentage change in the predicted odds of the measure for a 1-unit increase in the study change (end of study – baseline) of fT4 or TSH, and a 10-unit increase in the study change of fT3. The transformed coefficients are estimates of the risk ratios associated with the 1- or 10-unit increase in the study change of respective hormone level. A positive coefficient indicates that the measure increased with increasing hormone levels, whereas a negative coefficient indicates that the measure decreased with increasing hormone levels. Separate models were run for each hormone. Significant coefficients are shown in bold with corresponding unadjusted P values. For each set of related outcome measures, multiple testing adjustments were applied to all the individual P values from models adjusting for the same hormone type. a P was not significant at the 0.05 level when we applied multiple testing adjustments grouping outcomes from the same test together by Bonferroni correction. b These values were calculated as proportion of subjects for whom each measure was ≥15 (for correct on target) or >0 (for incorrect nontarget), because there were floor effects. c P was still significant at the 0.05 level when we applied multiple testing adjustments grouping outcomes from the same test together by Bonferroni correction. View Large Table 5. Correlations Between Changes in Thyroid Hormone Levels and Cognitive Measures at End of Study fT4 fT3 TSH Test Coefficient P Coefficient P Coefficient P Executive function Letter Cancellation Test  Time, s 1.85 (−5.37 to 9.07) 0.61 0.60 (−0.14 to 1.33) 0.11 −0.09 (−0.80 to 0.62) 0.80  % With no errors 13% (−57% to 182%) 0.80 −3% (−13% to 7%) 0.56 −9% (−20% to 2%) 0.13 Trail Making Test  Time, s 1.04 (−1.04 to 3.13) 0.32 −0.10 (−0.31 to 0.11) 0.34 −0.03 (−0.24 to 0.17) 0.75  ABC time, s −4.14 (−10.33 to 2.05) 0.19 0.12 (−0.52 to 0.75) 0.72 −0.04 (−0.65 to 0.57) 0.89  % With errors −23% (−82% to 177%) 0.70 −5% (−18% to 8%) 0.45 3% (−10% to 14%) 0.66  % With ABC errors −11% (−66% to 121%) 0.81 −8% (−18% to 1%) 0.10 −5% (−14% to 4%) 0.29 Iowa Gambling Test  Net-1 −0.24 (−3.91 to 3.43) 0.90 −0.40 (−0.77 to −0.03) 0.03a 0.09 (−0.27 to 0.46) 0.61  Net-2 −1.73 (−5.31 to 1.84) 0.34 −0.23 (−0.60 to 0.14) 0.21 −0.01 (−0.36 to 0.34) 0.97  Net-3 0.49 (−3.09 to 4.08) 0.79 0.09 (−0.28 to 0.45) 0.65 0.10 (−0.25 to 0.45) 0.58  Net-4 −1.88 (−5.72 to 1.97) 0.34 -0.51 (−0.89 to −0.12) 0.01a 0.001 (−0.374 to 0.377) >0.99  Net-5 −0.39 (−4.25 to 3.48) 0.84 −0.21 (−0.61 to 0.19) 0.30 0.07 (−0.31 to 0.44) 0.73 Working memory N-Back number correct  on target  1-Backb 252% (18% to 1118%) 0.03a 9% (−2% to 23%) 0.14 −7% (−14% to 2%) 0.11  2-Backb 1% (−60% to 159%) >0.99 7% (−3% to 20%) 0.20 −2% (−10% to 8%) 0.70  3-Back 0.17 (−0.76 to 1.10) 0.71 −0.04 (−0.14 to 0.05) 0.36 0.05 (−0.04 to 0.14) 0.31 N-Back number  incorrect nontarget  1-Backb −22% (−77% to 136%) 0.67 0% (−11% to 11%) 0.95 3% (−7% to 14%) 0.51  2-Backb 27% (−52% to 236%) 0.63 3% (−8% to 14%) 0.63 5% (−4% to 14%) 0.29  3-Back −0.28 (−1.07 to 0.51) 0.48 0.05 (−0.04 to 0.13) 0.27 −0.01 (−0.09 to 0.06) 0.74 Subject-Ordered  Pointing errors  6 0.02 (−0.14 to 0.18) 0.80 −0.01 (−0.02 to 0.01) 0.52 0.01 (−0.01 to 0.02) 0.27  8 0.11 (−0.16 to 0.38) 0.42 0.0003 (−0.0280 to 0.0286) 0.98 0.002 (−0.024 to 0.029) 0.88  10 0.10 (−0.14 to 0.35) 0.42 −0.003 (−0.029 to 0.023) 0.82 0.02 (−0.01 to 0.04) 0.18  12 0.05 (−0.28 to 0.39) 0.75 −0.02 (−0.05 to 0.02) 0.27 −0.003 (−0.036 to 0.029) 0.83 Declarative memory Paragraph Recall  Immediate −0.33 (−1.33 to 0.67) 0.52 −0.02 (−0.12 to 0.08) 0.71 0.04 (−0.05 to 0.14) 0.38  30-min delay −0.51 (−1.53 to 0.51) 0.32 0.01 (−0.10 to 0.11) 0.86 0.08 (−0.02 to 0.18) 0.11 Motor learning Pursuit Rotor Trial   Time on target, s  1 −5.13 (−9.77 to −0.48) 0.03a 0.10 (−0.40 to 0.59) 0.70 0.18 (−0.30 to 0.65) 0.47   2 −5.78 (−10.62 to −0.93) 0.02a 0.16 (−0.35 to 0.68) 0.53 0.22 (−0.29 to 0.72) 0.40   3 −6.70 (−11.25 to −2.14) 0.004c 0.09 (−0.40 to 0.58) 0.72 0.30 (−0.17 to 0.77) 0.21   4 −2.19 (−7.09 to 2.71) 0.38 0.10 (−0.41 to 0.61) 0.70 0.15 (−0.34 to 0.65) 0.54 Motor Sequence  Learning Test  Total movement  time, s 31.00 (−14.82 to 76.82) 0.18 2.31 (−2.45 to 7.07) 0.34 −1.08 (−5.74 to 3.58) 0.65 fT4 fT3 TSH Test Coefficient P Coefficient P Coefficient P Executive function Letter Cancellation Test  Time, s 1.85 (−5.37 to 9.07) 0.61 0.60 (−0.14 to 1.33) 0.11 −0.09 (−0.80 to 0.62) 0.80  % With no errors 13% (−57% to 182%) 0.80 −3% (−13% to 7%) 0.56 −9% (−20% to 2%) 0.13 Trail Making Test  Time, s 1.04 (−1.04 to 3.13) 0.32 −0.10 (−0.31 to 0.11) 0.34 −0.03 (−0.24 to 0.17) 0.75  ABC time, s −4.14 (−10.33 to 2.05) 0.19 0.12 (−0.52 to 0.75) 0.72 −0.04 (−0.65 to 0.57) 0.89  % With errors −23% (−82% to 177%) 0.70 −5% (−18% to 8%) 0.45 3% (−10% to 14%) 0.66  % With ABC errors −11% (−66% to 121%) 0.81 −8% (−18% to 1%) 0.10 −5% (−14% to 4%) 0.29 Iowa Gambling Test  Net-1 −0.24 (−3.91 to 3.43) 0.90 −0.40 (−0.77 to −0.03) 0.03a 0.09 (−0.27 to 0.46) 0.61  Net-2 −1.73 (−5.31 to 1.84) 0.34 −0.23 (−0.60 to 0.14) 0.21 −0.01 (−0.36 to 0.34) 0.97  Net-3 0.49 (−3.09 to 4.08) 0.79 0.09 (−0.28 to 0.45) 0.65 0.10 (−0.25 to 0.45) 0.58  Net-4 −1.88 (−5.72 to 1.97) 0.34 -0.51 (−0.89 to −0.12) 0.01a 0.001 (−0.374 to 0.377) >0.99  Net-5 −0.39 (−4.25 to 3.48) 0.84 −0.21 (−0.61 to 0.19) 0.30 0.07 (−0.31 to 0.44) 0.73 Working memory N-Back number correct  on target  1-Backb 252% (18% to 1118%) 0.03a 9% (−2% to 23%) 0.14 −7% (−14% to 2%) 0.11  2-Backb 1% (−60% to 159%) >0.99 7% (−3% to 20%) 0.20 −2% (−10% to 8%) 0.70  3-Back 0.17 (−0.76 to 1.10) 0.71 −0.04 (−0.14 to 0.05) 0.36 0.05 (−0.04 to 0.14) 0.31 N-Back number  incorrect nontarget  1-Backb −22% (−77% to 136%) 0.67 0% (−11% to 11%) 0.95 3% (−7% to 14%) 0.51  2-Backb 27% (−52% to 236%) 0.63 3% (−8% to 14%) 0.63 5% (−4% to 14%) 0.29  3-Back −0.28 (−1.07 to 0.51) 0.48 0.05 (−0.04 to 0.13) 0.27 −0.01 (−0.09 to 0.06) 0.74 Subject-Ordered  Pointing errors  6 0.02 (−0.14 to 0.18) 0.80 −0.01 (−0.02 to 0.01) 0.52 0.01 (−0.01 to 0.02) 0.27  8 0.11 (−0.16 to 0.38) 0.42 0.0003 (−0.0280 to 0.0286) 0.98 0.002 (−0.024 to 0.029) 0.88  10 0.10 (−0.14 to 0.35) 0.42 −0.003 (−0.029 to 0.023) 0.82 0.02 (−0.01 to 0.04) 0.18  12 0.05 (−0.28 to 0.39) 0.75 −0.02 (−0.05 to 0.02) 0.27 −0.003 (−0.036 to 0.029) 0.83 Declarative memory Paragraph Recall  Immediate −0.33 (−1.33 to 0.67) 0.52 −0.02 (−0.12 to 0.08) 0.71 0.04 (−0.05 to 0.14) 0.38  30-min delay −0.51 (−1.53 to 0.51) 0.32 0.01 (−0.10 to 0.11) 0.86 0.08 (−0.02 to 0.18) 0.11 Motor learning Pursuit Rotor Trial   Time on target, s  1 −5.13 (−9.77 to −0.48) 0.03a 0.10 (−0.40 to 0.59) 0.70 0.18 (−0.30 to 0.65) 0.47   2 −5.78 (−10.62 to −0.93) 0.02a 0.16 (−0.35 to 0.68) 0.53 0.22 (−0.29 to 0.72) 0.40   3 −6.70 (−11.25 to −2.14) 0.004c 0.09 (−0.40 to 0.58) 0.72 0.30 (−0.17 to 0.77) 0.21   4 −2.19 (−7.09 to 2.71) 0.38 0.10 (−0.41 to 0.61) 0.70 0.15 (−0.34 to 0.65) 0.54 Motor Sequence  Learning Test  Total movement  time, s 31.00 (−14.82 to 76.82) 0.18 2.31 (−2.45 to 7.07) 0.34 −1.08 (−5.74 to 3.58) 0.65 Correlations [95% confidence intervals (CIs)] with continuous outcomes were modeled with multiple linear regressions adjusting for age, years of education, WAIS-R score, BMI at baseline, change in BMI, sex and estrogen status, standard deviation of TSH values at all but first visit, baseline LT-4 dose, time on LT-4, time on LT-4 dose, baseline hormone value, and baseline value of outcome. The magnitude of the coefficient indicates the estimated change in the outcome for each 1-unit increase in the study change (end of study – baseline) of fT4 or TSH and a 10-unit increase in the study change of fT3. Correlations (95% CIs) with binary outcomes were modeled with multiple logistic regressions adjusting for baseline values of the hormone and the outcome of interest. Coefficients were transformed to estimate the percentage change in the predicted odds of the measure for a 1-unit increase in the study change (end of study – baseline) of fT4 or TSH, and a 10-unit increase in the study change of fT3. The transformed coefficients are estimates of the risk ratios associated with the 1- or 10-unit increase in the study change of respective hormone level. A positive coefficient indicates that the measure increased with increasing hormone levels, whereas a negative coefficient indicates that the measure decreased with increasing hormone levels. Separate models were run for each hormone. Significant coefficients are shown in bold with corresponding unadjusted P values. For each set of related outcome measures, multiple testing adjustments were applied to all the individual P values from models adjusting for the same hormone type. a P was not significant at the 0.05 level when we applied multiple testing adjustments grouping outcomes from the same test together by Bonferroni correction. b These values were calculated as proportion of subjects for whom each measure was ≥15 (for correct on target) or >0 (for incorrect nontarget), because there were floor effects. c P was still significant at the 0.05 level when we applied multiple testing adjustments grouping outcomes from the same test together by Bonferroni correction. View Large Analyses by actual TSH arm at the end of the study By the actual TSH arm at the end of the study, 57 subjects had TSH levels in the low-normal range, 28 in the high-normal range, and 53 in the mildly elevated range (Supplemental Table 1). Subjects did not differ in terms of any baseline demographic, clinical, or thyroid hormone variables. Mean L-T4 doses at the end of the study were progressively lower in the three arms (1.52 ± 0.06, 1.10 ± 0.10, and 0.92 ± 0.08 μg/kg/day, respectively, P < 0.001), whereas mean TSH levels were progressively higher (1.34 ± 0.08, 3.74 ± 0.12, and 9.74 ± 0.63 mU/L, P < 0.001). Mean fT4 and fT3 levels were lower in the mildly elevated TSH arm (1.89 ± 0.06, 1.44 ± 0.08, and 1.35 ± 0.04 ng/dL, respectively, P < 0.001; 206.2 ± 5.6, 196.0 ± 7.7, and 175.2 ± 5.4 pg/dL, P < 0.001). Thirty-three subjects in the low-normal TSH arm (72%), 19 in the high-normal TSH arm (40%), and 44 in the mildly elevated TSH arm (98%) had low fT3 levels (82 to 209 pg/dL). By the actual TSH arm at the end of the study, the SF-36 Bodily Pain subscale was higher in the mildly elevated TSH arm than in the high-normal TSH arm (34% vs 11% high, P = 0.03), and the 1-back number correct on target was lower in the high-normal TSH arm than in the low-normal TSH arm (58% vs 86%, P = 0.002) (Supplemental Tables 2 and 3); neither was significant after correction for multiple testing. There were no other differences between the three arms in health status, mood, or cognitive measures. Subjects’ perceptions of L-T4 doses At the final study visit, subjects were asked whether they thought their L-T4 doses at the end of the study were higher, lower, or unchanged from the start of the study and which of the two doses they preferred. Subjects were not able to accurately ascertain changes in L-T4 doses (P = 0.54) (Supplemental Table 4). However, the majority preferred whichever L-T4 dose they thought was higher (P < 0.001): 68% preferred their dose at the end of the study when they thought their dose had been increased during the study, whereas 96% preferred their dose at the beginning of the study when they thought their dose had been lowered during the study. Effect size calculations We performed effect size calculations by using results for the SF-36 mental component summary and mental health scales, POMS depression scale, 3-back correct on target, and Iowa Gambling Task-5, outcomes affected by mild thyroid dysfunction in our previous studies (7, 29, 30). The necessary sample sizes to achieve 80% power at a 5% level of significance were 659 to 5442 subjects (28) (data not shown). Discussion In this cohort of L-T4 treated subjects, we found little evidence that altering L-T4 doses in a randomized, blinded fashion to achieve TSH levels in the low-normal, high-normal, or mildly elevated range affected health status, mood, or cognitive function over 6 months. After correction for multiple testing, no outcomes were significantly different when the data were analyzed by discrete groups, either as intention to treat or by actual TSH arm achieved at the end of the study. When the data were analyzed by continuous variables, the SF-36 mental health subscale was inversely correlated with TSH levels, but the magnitude of this correlation was small, and there were no other significant findings. Most published studies of subclinical hypothyroid subjects are observational (1), and the most recent and largest failed to find significant quality of life, mood, or cognitive effects (31–36). These studies were often limited by the use of screening cognitive batteries, which are not designed to detect subtle defects in targeted cognitive domains likely to be affected by altered thyroid status. We used sensitive, specific cognitive measures to circumvent these limitations, based on human studies indicating that memory and executive function are preferentially affected, as well as animal studies of thyroid hormone and its receptor distribution in the brain (1). Seven previous studies have assessed effects of L-T4 therapy on symptoms or neurocognitive outcomes in patients with subclinical hypothyroidism (3, 5, 6, 8–11). Three reported improvements in depression or memory after 6 months, but they were open-label (3, 6, 9), although one also reported a neural substrate for thyroid effects in the frontal cortex by functional magnetic resonance imaging (6). Four were blinded and found minor or no effects after 3 to 12 months (5, 8, 10, 11). Our study extends these findings with detailed measures of cognitive areas that have not been intensively studied. With the exception of Stott et al.’s (11) recent study, where the primary outcome was a tiredness score, ours is also the largest interventional study in subclinical hypothyroidism. The literature regarding neurocognitive measures in euthyroid or L-T4-treated subjects has similar limitations. Most studies were observational, with only two small interventional trials of L-T4 therapy in subjects with normal TSH levels treated for 8 to 12 weeks. Neither found effects on hypothyroid symptoms, quality of life, psychological function, or limited measures of cognitive function (2, 4). Our results in a larger group of subjects treated for a longer time period extend these findings. Our findings do not support the idea that lowering TSH levels <2.50 mU/L (37) improves quality of life, mood, or cognitive function. A major strength of our study was our focus on executive function. This cognitive domain has not been extensively studied in thyroid disease, because rodent models do not adequately represent executive functions in humans, and many laboratory measures of executive function are insensitive to real-world scenarios. We included the Iowa Gambling Task because this test of executive function assesses decision making under uncertainty and models real-world behavior (38). L-T4-treated patients often complain of problems in this area, but our results do not corroborate objective changes in executive function when L-T4 doses are altered. Another major strength of our study was the blinded nature of our intervention. When we queried subjects, they could not accurately identify how their L-T4 doses had been altered, but the majority preferred whichever dose they perceived to be the higher dose, confirming an intrinsic bias toward higher L-T4 doses. Studies also indicate that self-knowledge of a thyroid disorder impairs psychological well-being regardless of the TSH level (30, 39), which would bias unblinded studies. We found a high prevalence of low serum fT3 levels at baseline and at the end of the study in all three arms. However, fT3 levels did not correlate with our outcomes. Previous reports have also described a high prevalence of low T3 levels in L-T4-treated subjects (40). However, studies of liothyronine add-on or monotherapy in hypothyroidism have not shown improvements in quality of life, mood, or cognitive outcomes (40). Additional studies have suggested that polymorphisms in deiodinase or brain thyroid hormone transporter genes correlate with psychological scores and response to liothyronine, so subsets of L-T4-treated patients may respond to L-T3 (40). Our study also has limitations. A major limitation was our sample size, and it is possible that we were underpowered to detect small effects. To address this problem, we performed an effect size calculation, which showed that large numbers of subjects (>600, depending on outcome) would need to be studied to reach statistical significance. The small magnitude of our effects suggests that clinically meaningful alterations are unlikely, but it remains possible that subtle effects were missed. We did not include an untreated euthyroid control group, so we cannot ascertain whether our subjects had decrements in quality of life, mood, or cognition at baseline compared with the general population. However, we previously published results of the same tests of quality of life, mood, and cognitive function in L-T4-treated subjects compared with euthyroid control subjects and found mild decrements in the SF-36 (mental component summary, mental health subscale, and vitality subscale) without differences in mood or cognitive function (30). Therefore, we suspect that subjects in the current study had slightly lower quality of life than matched euthyroid subjects. We performed a large number of correlations, although we accounted for this difference in our analyses, and it is possible that some of our minor findings were due to chance. Most of our subjects were women and were younger and slimmer than the U.S. population. Most of our subjects were Caucasian. Our subjects were heterogeneous in terms of thyroid diagnosis and length of L-T4 treatment. We limited our study to 6 months to optimize subject retention, recognizing that this is sufficient time to observe changes in our outcomes. Many of our subjects experienced variations in TSH levels at interim visits that necessitated L-T4 dose adjustments, which we accounted for in our analysis. One-third of our subjects did not achieve target TSH levels, particularly in the high-normal TSH group. To address this limitation, we conducted separate intention-to-treat and actual end-of-study analyses, as well as analyses using changes in TSH and thyroid hormones as continuous variables. These complementary analyses showed similar results, strengthening our conclusions. In addition, we note that regardless of the ultimate TSH attained, L-T4 doses were altered in each arm, consistent with the study design. Because patients often request changes in their L-T4 doses regardless of their TSH levels, an interpretation of our results based on L-T4 dose adjustments is a valuable perspective for clinical practice. We attempted to collect blood samples at a consistent time of day, but this was not always possible. In healthy and L-T4-treated subjects, TSH levels decrease slightly between 07:00 and 09:00 and then remain stable until the evening (41). Finally, we limited our cognitive testing to executive function and memory, although studies do not indicate major effects in other areas (1). In summary, we found no relevant differences in health status, mood, memory, or executive functions in hypothyroid subjects when L-T4 doses were altered in a randomized, blinded fashion to achieve TSH levels in the low-normal, high-normal, or mildly elevated range. Given our limited sample size, additional studies would be helpful, particularly in targeted populations (e.g., symptomatic subjects, subjects with low fT3 levels, or subjects with genetic polymorphisms that affect thyroid hormone action). In the absence of definitive data, reasonable expectations should be discussed with treated hypothyroid patients who report symptoms in these areas and request higher L-T4 doses or alternative thyroid hormone preparations. Abbreviations: Abbreviations: ALS Affective Lability Scale BMI body mass index CV coefficient of variation fT3 free triiodothyronine fT4 free thyroxine L-T4 levothyroxine OHSU Oregon Health & Science University POMS Profile of Mood States SF-36 36-Item Short Form Health Survey TSH thyrotropin WAIS-R Wechsler Adult Intelligence Scale–Revised Acknowledgments We thank the staff of the OHSU Clinical and Translational Research Center for excellent patient care and research support and the Biostatistics & Design Program for data analysis expertise. Financial Support: This work was supported by National Institutes of Health Grants R01 DK075496 (to M.H.S.) and UL1 RR024120 (to OHSU). Clinical Trial Information: ClinicalTrials.gov no. NCT00565864 (registered November 30, 2007). Disclosure Summary: The authors have nothing to disclose. References 1. Samuels MH . Thyroid disease and cognition . Endocrinol Metab Clin North Am . 2014 ; 43 ( 2 ): 529 – 543 . Google Scholar CrossRef Search ADS PubMed 2. Pollock MA , Sturrock A , Marshall K , Davidson KM , Kelly CJ , McMahon AD , McLaren EH . Thyroxine treatment in patients with symptoms of hypothyroidism but thyroid function tests within the reference range: randomised double blind placebo controlled crossover trial . BMJ . 2001 ; 323 ( 7318 ): 891 – 895 . Google Scholar CrossRef Search ADS PubMed 3. Bono G , Fancellu R , Blandini F , Santoro G , Mauri M . Cognitive and affective status in mild hypothyroidism and interactions with L-thyroxine treatment . Acta Neurol Scand . 2004 ; 110 ( 1 ): 59 – 66 . Google Scholar CrossRef Search ADS PubMed 4. Walsh JP , Ward LC , Burke V , Bhagat CI , Shiels L , Henley D , Gillett MJ , Gilbert R , Tanner M , Stuckey BG . Small changes in thyroxine dosage do not produce measurable changes in hypothyroid symptoms, well-being, or quality of life: results of a double-blind, randomized clinical trial . J Clin Endocrinol Metab . 2006 ; 91 ( 7 ): 2624 – 2630 . Google Scholar CrossRef Search ADS PubMed 5. Jorde R , Waterloo K , Storhaug H , Nyrnes A , Sundsfjord J , Jenssen TG . Neuropsychological function and symptoms in subjects with subclinical hypothyroidism and the effect of thyroxine treatment . J Clin Endocrinol Metab . 2006 ; 91 ( 1 ): 145 – 153 . Google Scholar CrossRef Search ADS PubMed 6. Zhu DF , Wang ZX , Zhang DR , Pan ZL , He S , Hu XP , Chen XC , Zhou JN . fMRI revealed neural substrate for reversible working memory dysfunction in subclinical hypothyroidism . Brain . 2006 ; 129 ( Pt 11 ): 2923 – 2930 . Google Scholar CrossRef Search ADS PubMed 7. Samuels MH , Schuff KG , Carlson NE , Carello P , Janowsky JS . Health status, mood, and cognition in experimentally induced subclinical hypothyroidism . J Clin Endocrinol Metab . 2007 ; 92 ( 7 ): 2545 – 2551 . Google Scholar CrossRef Search ADS PubMed 8. Razvi S , Ingoe L , Keeka G , Oates C , McMillan C , Weaver JU . The beneficial effect of L-thyroxine on cardiovascular risk factors, endothelial function, and quality of life in subclinical hypothyroidism: randomized, crossover trial . J Clin Endocrinol Metab . 2007 ; 92 ( 5 ): 1715 – 1723 . Google Scholar CrossRef Search ADS PubMed 9. Correia N , Mullally S , Cooke G , Tun TK , Phelan N , Feeney J , Fitzgibbon M , Boran G , O’Mara S , Gibney J . Evidence for a specific defect in hippocampal memory in overt and subclinical hypothyroidism . J Clin Endocrinol Metab . 2009 ; 94 ( 10 ): 3789 – 3797 . Google Scholar CrossRef Search ADS PubMed 10. Parle J , Roberts L , Wilson S , Pattison H , Roalfe A , Haque MS , Heath C , Sheppard M , Franklyn J , Hobbs FD . A randomized controlled trial of the effect of thyroxine replacement on cognitive function in community-living elderly subjects with subclinical hypothyroidism: the Birmingham Elderly Thyroid study . J Clin Endocrinol Metab . 2010 ; 95 ( 8 ): 3623 – 3632 . Google Scholar CrossRef Search ADS PubMed 11. Stott DJ , Rodondi N , Kearney PM , Ford I , Westendorp RGJ , Mooijaart SP , Sattar N , Aubert CE , Aujesky D , Bauer DC , Baumgartner C , Blum MR , Browne JP , Byrne S , Collet TH , Dekkers OM , den Elzen WPJ , Du Puy RS , Ellis G , Feller M , Floriani C , Hendry K , Hurley C , Jukema JW , Kean S , Kelly M , Krebs D , Langhorne P , McCarthy G , McCarthy V , McConnachie A , McDade M , Messow M , O’Flynn A , O’Riordan D , Poortvliet RKE , Quinn TJ , Russell A , Sinnott C , Smit JWA , Van Dorland HA , Walsh KA , Walsh EK , Watt T , Wilson R , Gussekloo J ; TRUST Study Group . Thyroid hormone therapy for older adults with subclinical hypothyroidism . N Engl J Med . 2017 ; 376 ( 26 ): 2534 – 2544 . Google Scholar CrossRef Search ADS PubMed 12. Spreen O , Strauss EA . General intellectual ability and assessment of premorbid intelligence . In: Spreen O , Strauss EA , eds. A Compendium of Neuropsychological Tests: Administration, Norms, and Commentary . New York, NY : Oxford University Press ; 1998 : 90 – 102 . 13. Billewicz WZ , Chapman RS , Crooks J , Day ME , Gossage J , Wayne E , Young JA . Statistical methods applied to the diagnosis of hypothyroidism . Q J Med . 1969 ; 38 ( 150 ): 255 – 266 . Google Scholar PubMed 14. McMillan C , Bradley C , Razvi S , Weaver J . Evaluation of new measures of the impact of hypothyroidism on quality of life and symptoms: the ThyDQoL and ThySRQ . Value Health . 2008 ; 11 ( 2 ): 285 – 294 . Google Scholar CrossRef Search ADS PubMed 15. Spreen O , Strauss EA . Adaptive behavior and personality . In: Spreen O , Strauss EA , eds. A Compendium of Neuropsychological Tests: Administration, Norms, and Commentary . New York, NY : Oxford University Press ; 1998 : 612 – 616 . 16. Spreen O , Strauss EA . Adaptive behavior and personality . In: Spreen O , Strauss EA , eds. A Compendium of Neuropsychological Tests: Administration, Norms, and Commentary . New York, NY : Oxford University Press ; 1998 : 644 – 646 . 17. Harvey PD , Greenberg BR , Serper MR . The affective lability scales: development, reliability, and validity . J Clin Psychol . 1989 ; 45 ( 5 ): 786 – 793 . Google Scholar CrossRef Search ADS PubMed 18. Byrd DA , Touradji P , Tang MX , Manly JJ . Cancellation test performance in African American, Hispanic, and White elderly . J Int Neuropsychol Soc . 2004 ; 10 ( 3 ): 401 – 411 . Google Scholar CrossRef Search ADS PubMed 19. Lezak MD , Howieson DB , Loring DW . Orientation and attention . In: Lezak MD , Howieson DB , Loring DW , eds. Neuropsychological Assessment . New York, NY : Oxford University Press ; 1995 : 371 – 374 . 20. Singh V , Khan A . Heterogeneity in choices on Iowa Gambling Task: preference for infrequent-high magnitude punishment . Mind Soc . 2009 ; 8 ( 1 ): 43 – 57 . Google Scholar CrossRef Search ADS 21. Lezak MD , Howieson DB , Loring DW . Orientation and attention . In: Lezak MD , Howieson DB , Loring DW , eds. Neuropsychological Assessment . New York, NY : Oxford University Press ; 1996 : 363 – 364 . 22. Spreen O , Strauss EA . Executive functions . In: Spreen O , Strauss EA , eds. A Compendium of Neuropsychological Tests: Administration, Norms, and Commentary . New York, NY : Oxford University Press ; 1998 : 208 – 212 . 23. Lezak M , Howieson DB , Loring DW . Memory I: tests . In: Lezak M , Howieson DB , Loring DW , eds. Neuropsychological Assessment . New York, NY : Oxford University Press ; 1995 : 444 – 450 . 24. van Gorp WG , Altshuler L , Theberge DC , Mintz J . Declarative and procedural memory in bipolar disorder . Biol Psychiatry . 1999 ; 46 ( 4 ): 525 – 531 . Google Scholar CrossRef Search ADS PubMed 25. Spreen O , Strauss EA . Motor tests . In: Spreen O , Strauss EA , eds. A Compendium of Neuropsychological Tests: Administration, Norms and Commentary . New York, NY : Oxford University Press ; 1998 : 577 – 599 . 26. Spencer CA , Hollowell JG , Kazarosyan M , Braverman LE . National Health and Nutrition Examination Survey III thyroid-stimulating hormone (TSH)–thyroperoxidase antibody relationships demonstrate that TSH upper reference limits may be skewed by occult thyroid dysfunction . J Clin Endocrinol Metab . 2007 ; 92 ( 11 ): 4236 – 4240 . Google Scholar CrossRef Search ADS PubMed 27. R Core Team. R: A language and environment for statistical computing. Vienna, Austria: R Foundation for Statistical Computing;. 2015. Available at: http://www.R-project.org/. Accessed 5 June 2017. 28. Cohen J . Statistical Power Analysis for the Behavioral Sciences . Mahwah, NJ : Lawrence Erlbaum Associates ; 1988 . 29. Samuels MH , Kolobova I , Smeraglio A , Niederhausen M , Janowsky JS , Schuff KG . Effects of thyroid function variations within the laboratory reference range on health status, mood, and cognition in levothyroxine-treated subjects . Thyroid . 2016 ; 26 ( 9 ): 1173 – 1184 . Google Scholar CrossRef Search ADS PubMed 30. Samuels MH , Kolobova I , Smeraglio A , Peters D , Janowsky JS , Schuff KG . The effects of levothyroxine replacement or suppressive therapy on health status, mood, and cognition . J Clin Endocrinol Metab . 2014 ; 99 ( 3 ): 843 – 851 . Google Scholar CrossRef Search ADS PubMed 31. Roberts LM , Pattison H , Roalfe A , Franklyn J , Wilson S , Hobbs FD , Parle JV . Is subclinical thyroid dysfunction in the elderly associated with depression or cognitive dysfunction ? Ann Intern Med . 2006 ; 145 ( 8 ): 573 – 581 . Google Scholar CrossRef Search ADS PubMed 32. Wijsman LW , de Craen AJ , Trompet S , Gussekloo J , Stott DJ , Rodondi N , Welsh P , Jukema JW , Westendorp RG , Mooijaart SP . Subclinical thyroid dysfunction and cognitive decline in old age . PLoS One . 2013 ; 8 ( 3 ): e59199 . Google Scholar CrossRef Search ADS PubMed 33. Fjaellegaard K , Kvetny J , Allerup PN , Bech P , Ellervik C . Well-being and depression in individuals with subclinical hypothyroidism and thyroid autoimmunity: a general population study . Nord J Psychiatry . 2015 ; 69 ( 1 ): 73 – 78 . Google Scholar CrossRef Search ADS PubMed 34. van de Ven AC , Netea-Maier RT , de Vegt F , Ross HA , Sweep FC , Kiemeney LA , Hermus AR , den Heijer M . Is there a relationship between fatigue perception and the serum levels of thyrotropin and free thyroxine in euthyroid subjects ? Thyroid . 2012 ; 22 ( 12 ): 1236 – 1243 . Google Scholar CrossRef Search ADS PubMed 35. Klaver EI , van Loon HC , Stienstra R , Links TP , Keers JC , Kema IP , Kobold AC , van der Klauw MM , Wolffenbuttel BH . Thyroid hormone status and health-related quality of life in the LifeLines Cohort Study . Thyroid . 2013 ; 23 ( 9 ): 1066 – 1073 . Google Scholar CrossRef Search ADS PubMed 36. Engum A , Bjøro T , Mykletun A , Dahl AA . An association between depression, anxiety and thyroid function--a clinical fact or an artefact ? Acta Psychiatr Scand . 2002 ; 106 ( 1 ): 27 – 34 . Google Scholar CrossRef Search ADS PubMed 37. Wartofsky L , Dickey RA . The evidence for a narrower thyrotropin reference range is compelling . J Clin Endocrinol Metab . 2005 ; 90 ( 9 ): 5483 – 5488 . Google Scholar CrossRef Search ADS PubMed 38. Winstanley CA , Clark L . Translational models of gambling-related decision-making . Curr Top Behav Neurosci . 2016 ; 28 : 93 – 120 . Google Scholar CrossRef Search ADS PubMed 39. Panicker V , Evans J , Bjøro T , Asvold BO , Dayan CM , Bjerkeset O . A paradoxical difference in relationship between anxiety, depression and thyroid function in subjects on and not on T4: findings from the HUNT study . Clin Endocrinol (Oxf) . 2009 ; 71 ( 4 ): 574 – 580 . Google Scholar CrossRef Search ADS PubMed 40. Jonklaas J , Bianco AC , Bauer AJ , Burman KD , Cappola AR , Celi FS , Cooper DS , Kim BW , Peeters RP , Rosenthal MS , Sawka AM ; American Thyroid Association Task Force on Thyroid Hormone Replacement . Guidelines for the treatment of hypothyroidism: prepared by the American Thyroid Association Task Force on Thyroid Hormone Replacement . Thyroid . 2014 ; 24 ( 12 ): 1670 – 1751 . Google Scholar CrossRef Search ADS PubMed 41. Roelfsema F , Veldhuis JD . Thyrotropin secretion patterns in health and disease . Endocr Rev . 2013 ; 34 ( 5 ): 619 – 657 . Google Scholar CrossRef Search ADS PubMed Copyright © 2018 Endocrine Society

Journal

Journal of Clinical Endocrinology and MetabolismOxford University Press

Published: Mar 2, 2018

There are no references for this article.

You’re reading a free preview. Subscribe to read the entire article.


DeepDyve is your
personal research library

It’s your single place to instantly
discover and read the research
that matters to you.

Enjoy affordable access to
over 18 million articles from more than
15,000 peer-reviewed journals.

All for just $49/month

Explore the DeepDyve Library

Search

Query the DeepDyve database, plus search all of PubMed and Google Scholar seamlessly

Organize

Save any article or search result from DeepDyve, PubMed, and Google Scholar... all in one place.

Access

Get unlimited, online access to over 18 million full-text articles from more than 15,000 scientific journals.

Your journals are on DeepDyve

Read from thousands of the leading scholarly journals from SpringerNature, Elsevier, Wiley-Blackwell, Oxford University Press and more.

All the latest content is available, no embargo periods.

See the journals in your area

DeepDyve

Freelancer

DeepDyve

Pro

Price

FREE

$49/month
$360/year

Save searches from
Google Scholar,
PubMed

Create lists to
organize your research

Export lists, citations

Read DeepDyve articles

Abstract access only

Unlimited access to over
18 million full-text articles

Print

20 pages / month

PDF Discount

20% off