Task Complexity, Cognitive Load, and L1 Speech

Task Complexity, Cognitive Load, and L1 Speech Abstract Relationships among task characteristics, L2 performance, and interlanguage development are of interest both for SLA research and the design of syllabuses and language teaching materials. Complexity has been identified as a promising, but methodologically problematic, task design feature. A study was conducted of the effects of progressive increases in the complexity (operationalized as number of elements) of three versions of each of three tasks on the syntactic complexity and lexical diversity of the speech of 42 English native speakers. Data on native speaker performance are important because they reveal task complexity effects unfiltered by non-native competence. Independent evidence that greater task complexity increased cognitive load was shown by participant self-ratings of perceived difficulty, mental effort, and stress, shorter prospective duration estimates and, using dual-task methodology, slower reaction times. Mid-complex versions of the three tasks elicited the most complex syntactic structures, and the most complex versions elicited the greatest lexical diversity. Implications are noted for the design of parallel studies with non-native speakers, along with suggested methodological improvements for future research with native and non-native populations. INTRODUCTION Over the past 50 years, SLA has witnessed a considerable number of pedagogical models and approaches which have undergone many changes through continuous trial and error. Recent approaches that are prevalent to this day, for example Communicative Language Teaching, the Natural Approach, Content-Based Instruction, and Task-Based Language Teaching (TBLT), have shifted their attention from the teacher to the learner’s needs and interests. Among such approaches, TBLT is a particularly learner-centered one for its focus on the importance of a logically conducted analysis of learners’ needs, which is an essential prerequisite for the motivation, design, and success of second and foreign language programs. Furthermore, it offers a solution to problems with existing approaches such as synthetic syllabuses and ‘focus on forms’, and analytic syllabuses and ‘focus on meaning’, by employing an analytic (task) syllabus and a ‘focus on form’ method that involves timely reactive attention to linguistic problems in context, as they arise during task performance (see Long 2015). Long (1985) first proposed the idea of task as a meaningful and viable unit of analysis in (i) identifying learners' needs, (ii) defining syllabus content, (iii) organizing language acquisition opportunities, and (iv) measuring student achievement. Since the 1985 publication, tasks have been the subject of a considerable amount of research concerning their uses in both SLA research and language teaching. A key issue in TBLT is to identify principled criteria with which tasks can be classified and sequenced. Despite differing in rationale and predictions, Robinson (2001a, 2003, 2005, 2011) and Skehan (1996, 1998) each devised a model that shared the same underlying goal of providing such criteria. The two models have generated a proliferation of studies in recent years (see Robinson 2011; Skehan 2014). Unfortunately, most have reported mixed or null findings, unable to provide unambiguous evidence for or against either. To investigate this issue, this study investigates (i) whether increasing task complexity increases cognitive load, and (ii) whether differences in cognitive load affect native speaker performance in ways beneficial for L2 development. MODELS OF TASK COMPLEXITY Long (1996, 2015) and Long and Crookes (1992) have proposed that during syllabus design, pedagogic tasks should be sequenced in an order of increasing complexity (task complexity, not linguistic complexity), eventually resembling the full demands of real-world target tasks that a needs analysis shows learners need to be able to perform successfully. A model consistent with this proposal was developed by Robinson (2003, 2007, 2011): the Triadic Componential Framework and the Cognition Hypothesis (CH). The Triadic Componential Framework specifies three superordinate categories of tasks: task complexity (cognitive factors), condition (interactive factors), and difficulty (learner factors). The fundamental claim of the CH is that increases in task complexity are the logical basis for task sequencing and syllabus design because attentional resources are increasingly engaged as the cognitive demands of tasks are increased. Three major predictions are made: (i) increases in task complexity along resource-directing dimensions will push learners to greater accuracy and linguistic complexity, but less fluency, (ii) greater task complexity will promote interaction and negotiation of meaning, leading to heightened attention to, and incorporation of, task input and modification of output, and (iii) individual differences in ability and affective variables contributing to perceptions of task difficulty will differentiate performance and learning as task complexity increases. An alternative model for task sequencing was proposed by Skehan (1996, 1998) and Skehan and Foster (1997): the Limited Attentional Capacity Hypothesis (LACM), or the Trade-Off Hypothesis. Unlike the CH, the LACM assumes a single source of attention accessible to learners, whose limited capacity restricts the mapping of form-meaning relationships. Skehan (1998) proposed that a learner can only attend to one aspect of linguistic performance, either accuracy or complexity, at the expense of the other. The trade-off between accuracy and complexity means that increased fluency may be accompanied by either greater accuracy or complexity (at best), but not by both at the same time (Skehan and Foster 1997). Task characteristics that are argued to affect the nature of performance include familiarity of information, interactivity, degree of structure, complexity of outcomes, and transformation of information. TASK COMPLEXITY STUDIES AND THEIR LIMITATIONS The LACM and CH have given rise to a plethora of studies, most of which focus on an increase in resource-directing dimensions that are relatively easier to operationalize, that is ± Here-and-Now, ± few elements, and ± intentional reasoning. In addition, other factors, such as h/l (high/low) L2 proficiency, ±monologic, and ±planning-time have been manipulated in numerous studies. Support for either model has been cited, but in fact, many have obtained mixed/null findings or have drawn conclusions based on faulty interpretations. In an effort to obtain support for the LACM, Foster and Skehan (1996) investigated the effects of task complexity and planning on language production. Thirty-two learners of English as a foreign language (EFL) with various L1 backgrounds performed three types of two-way oral tasks (personal information exchange, narrative, and decision-making). These tasks were assumed to differ in task complexity, with the personal information exchange task considered as the least complex, and the decision-making task the most complex. Participants were divided into three groups that differed in the conditions under which tasks were performed: no planning, undetailed planning, and detailed planning. The authors found that production was more fluent and more syntactically complex under planned conditions. However, those in the undetailed planning condition produced the most accurate output. A significant interaction was found between task-type and planning condition such that planning effects on accuracy and linguistic complexity were greater in the more complex narrative and decision-making tasks than in the simple personal information exchange task. The authors concluded that their findings supported the LACM, since planning had a positive effect on fluency and complexity, but not accuracy, thus indicating a trade-off effect. Similar results were found in Skehan and Foster’s (1997) study. Forty EFL students with diverse L1 backgrounds performed the same task-types as in Foster and Skehan (1996). However, this study investigated the effects of planning (+planning vs. −planning) and ±knowledge of a post-task activity. It was found that students under planned conditions, as opposed to unplanned conditions, generally showed greater fluency, accuracy, and complexity in their oral output. When planning effects were compared across task-types, strong effects on accuracy were found in the narrative task but not the decision-making task. The reverse pattern was found for complexity, supporting the idea of a trade-off effect. Knowledge of a post-task activity had only a small effect on accuracy. In a small-scale study with 12 learners of English, Robinson (1995) found limited support for the CH. Participants of intermediate proficiency performed oral narrative tasks with increasing complexity along the ±Here-and-Now dimension. Support for the CH was found with regard to the proportions of lexical content words—learners used a greater variety of lexical words in the [−Here-and-Now] condition. However, no significant differences were found for accuracy and fluency, which were measured by five outcome measures. Robinson attributed the lack of significant findings to a small sample size, questionable reliability and validity of the outcome measures, relatively low proficiency of the learners, and use of one-way open tasks as opposed to two-way closed tasks. Bearing in mind the limitations of earlier work, Robinson (2001b) conducted a more fine-grained study and investigated the effects of task complexity, task sequencing, and task role on learners’ interactive production and their perceptions of difficulty. He predicted that task complexity would have similar effects on language production in both monologic and interactive tasks, except with regard to linguistic complexity. The assumption was that the nature of a complex interactive task would lead to reduced linguistic complexity, due to greater numbers of elliptical/single-clause answers to clarification requests and confirmation checks. Forty-four Japanese learners of English performed two versions of a direction-giving map task (MT) (±Here-and-Now). Robinson found that in the complex task, token type ratio and number of words per clause were significantly lower, and the number of confirmation checks was significantly higher than in the simple version. Perceptions of difficulty and stress were higher in the complex task, and the task sequence of simple to complex resulted in higher accuracy and fluency. These findings are congruent with the predictions of the CH, in that complex tasks would lead to increased lexical diversity and use of confirmation checks, and reduced fluency. With 46 university students of lower-intermediate English proficiency, Gilabert (2007) investigated how the ±Here-and-Now dimension interacted with ± planning time. Participants performed an oral narrative task with comic strips, and ±Here-and-Now was operationalized by present/past tense and presence/absence of contextual support. A total of 10 were given in the [+planning time] condition, and 50 s in the [−planning time] condition. Statistical analyses revealed that while there was a positive task complexity effect on accuracy and fluency in planned and unplanned conditions, structural complexity remained the same in both conditions. Increasing task complexity even reduced lexical complexity. Gilabert also found that the planned condition was more beneficial for fluency and lexical complexity. In short, this study supported predictions concerning ±planning time, but directions of the fluency and lexical complexity measures were counterevidence to the LACM and CH. Several studies compared the effects of monologic tasks with those of interactive tasks. In Michel et al.’s (2007) study, 44 learners of Dutch (L1 being either Moroccan or Turkish) performed an oral task that was increased in complexity along the ±few elements dimension. Participants were assigned to one of two groups: the monologic condition or the dialogic (dyad) condition. The oral task involved giving advice to a friend about buying a certain device. Those in the [+monologic] condition left a message over the phone, while those in the [−monologic] condition discussed the matter with their partner over the phone. The simple task involved two devices, and the complex task involved six. Results showed that increased task complexity promoted accuracy, slightly affected lexical complexity, but reduced fluency. Compared to monologic tasks, interactive tasks triggered greater accuracy and fluency, and lower structural complexity. An overall significant interaction between task complexity and interactivity was not found. When looking at interactions on specific outcome measures, learners were found to use more accurate speech in the complex monologic task. However, this beneficial effect disappeared in the complex dialogic task, which runs counter to the CH predictions of interactivity. In light of these results, the authors concluded that their study provided limited support for the CH, and rejected the existence of trade-off effects between accuracy and complexity. A similar study was conducted by Michel (2011), which investigated increased task complexity effects along the ±few elements dimension. This study distinguishes itself from others in that the researcher compared the performance of 64 learners of Dutch with that of 44 native speakers of Dutch. Two types of oral tasks were employed, in which participants had to choose the best dating or studying couple among four people (simple) or six people (complex). Michel found that increased complexity had a significant effect only on lexical diversity. There were no statistically significant differences between simple and complex tasks with regard to accuracy. Learners displayed greater accuracy, lexical diversity, and fluency on the interactive tasks, but less structural complexity on the monologic tasks. With respect to fluency measures, only native speakers showed a significant interaction effect between task complexity and interactivity such that they were more fluent in simple dialogues. Also taking into consideration the results of task difficulty judgments, which revealed that dialogues were considered easier than monologues, Michel concluded that monologues are cognitively more complex than dialogues, and that the study only provided limited support for the CH. Robinson’s and Skehan’s models of task complexity have generated considerable interest among educators and language researchers. However, much previous research is not without limitations. Three important problems include: (i) a lack of consistent operationalization of complexity dimensions, (ii) a lack of consistency in the choice and operationalization of outcome measures, and (iii) a failure to include native speaker data. Any one of these, or a combination, may have contributed to the mixed or null findings within and across studies. The LACM claims that code complexity, cognitive complexity, and communicative stress may influence the learner in such a way that attention is allocated to certain aspects of linguistic performance. However, it fails to explain how each dimension is operationalized, and how they interact with one another. Likewise, the more intricate CH provides no clear guidelines for operationalizing dimensions of task complexity nor suggests how task conditions and difficulty interact to affect performance. As a result, most researchers have manipulated task complexity along one or more of three dimensions: ± Here-and-Now, ±few elements, and ±intentional reasoning. With regard to measures of linguistic performance, Ellis (2011) criticized early work (Foster and Skehan 1996, and Robinson 1995) for their questionable operationalizations of complexity, accuracy, and fluency (CAF). Jackson and Suethanapornkul (2013) reported 84 different measures of complexity, accuracy, lexis, and fluency employed in just 17 CH studies. Norris and Ortega (2009) found 13 different measures of complexity alone in 16 studies. The number and types of performance measures show immense variation, and the lack of consistent findings may well be due to this variation. The third problem relates to the omission of native speaker data. Although the two task complexity frameworks pertain to L2 task performance and development, it would be beneficial, and potentially necessary, to test predictions with L1 speakers to establish a systematic relationship between task design manipulations and performance. Unlike L2 learners, who vary widely in individual differences, native speakers ‘have a full, homogeneous, and comparable, command of their L1’ (Long 2015, p. 239). As Foster and Tavakoli (2009) point out, native speaker data should be used as a baseline when investigating how learners perform language tasks because it enables us to distinguish the performance features that are due to L2 processing from those due to task performance. Using native speaker data initially ‘gives greater validity into claims that this is affected by the independent variable(s) under scrutiny’ (Foster and Tavakoli 2009, p. 868) and will provide a much more reliable window on task complexity effects. Any changes in performance, especially linguistic complexity, can be attributed to task complexity manipulations alone, without having to consider confounds resulting from task complexity effects having been filtered through the variability of learner competence (see Ellis 2011; and Long 2015). VALIDATION OF TASK COMPLEXITY MANIPULATIONS Numerous studies have claimed to find evidence supporting or refuting the LACM or the CH. However, Norris (2010) and Révész (2014) have argued that one important step has only been assumed and not empirically tested. To date, only a handful of studies have investigated whether task complexity effects actually lead to the desired changes in cognitive load, and whether those changes in turn cause an increase or reduction in accuracy, complexity, and fluency. They suggested four ways of addressing this issue: (i) subjective self-ratings, (ii) subjective time estimations, (iii) dual-task methodology, and (iv) psychophysiological techniques, such as eye-tracking. Stimulated recall protocols have also been used in recent TBLT studies as an introspective measure to tap into learner-internal processes (Kim et al. 2015; Révész et al. 2017; Rostamian et al. 2017). Self-ratings of difficulty have been employed in several studies testing the CH (Baralt 2013; Gilabert et al. 2009, 2011; Ishikawa 2011; Kim 2009; Michel 2011; Robinson 2001b; Sasayama 2016). In a study of 44 Japanese learners of English performing an oral interactive task, Robinson (2001b) found that increasing task complexity along the dimensions of amount of information and availability of prior knowledge had a significant effect on learners’ ratings of overall difficulty and stress. However, ratings of interest and motivation were found to be unrelated to task complexity manipulations. Usng the questionnaire created by Robinson (2001b), Gilabert et al. (2009) found that 60 learners of English perceived complex tasks to be more difficult and stressful, and felt less confident when performing them. On the other hand, there were no significant differences in their interest and motivation between simple and complex versions of tasks. Ishikawa (2011) investigated the performance of 46 Japanese learners of English performing oral tasks of increasing complexity along the ±intentional reasoning dimension. He found that complex reasoning was rated as more difficult than simple or no reasoning. Such studies show that self-ratings of difficulty and/or stress can be used to measure the cognitive load of a task. Regarding self-estimations of duration, the rationale is that time seems to pass quicker when a person is performing a difficult or attention-demanding task as opposed to one that is easy or less attention-demanding. Estimation of a target duration comes from two paradigms: the prospective paradigm (experienced duration) and retrospective duration (remembered duration) (Block and Zakay 2008). In the former, a person is aware during a time that she/he must estimate its duration. In the latter, she/he is aware of making an estimation only after the time has ended. Block et al.’s (2010) meta-analysis of 117 experiments found that if difficult processing is required, the ratio of subjective duration to objective duration decreases when a person is aware that time judgments are to be made. Put simply, people feel that time goes faster when performing more difficult tasks. Zakay (1992) and Block (1992) also found that prospective judgments were shorter for difficult tasks than for easy tasks. Attention models in psychology predict that processors of temporal and non-temporal information share the same attentional resources. When the load of non-temporal information is increased, fewer attentional resources are allocated to temporal information (Block 1992, 2003). Accordingly, people are assumed to be less accurate in estimating the duration of a task when greater cognitive load is placed on them. Several studies have employed time estimations to measure the cognitive complexity of tasks (Baralt 2013; Malicka and Levkina 2012). In Baralt’s (2013) investigation of differential modality effects, 84 adult learners of Spanish estimated how long it took for them to perform a story retelling task after task completion (i.e. retrospective paradigm), which was compared with the actual time that was spent on task performance. Results showed that the estimated times for the groups that carried out complex tasks (+intentional reasoning) were significantly longer than the actual task performance time, but those for the groups that carried out simple tasks (−intentional reasoning) were significantly shorter than the actual time. Malicka and Levkina (2012) also employed the retrospective paradigm in their attempt to validate task complexity effects. In this study, 37 learners of Spanish performed an instruction-giving task whose complexity was operationalized along ±number of elements and ±intentional reasoning, and then were asked to choose the task that they felt took longer to complete. However, they were not asked to estimate the time in minutes and/or seconds. The authors claimed that high-proficiency learners were more inaccurate than low-proficiency learners at time estimation, such that they believed it took longer to complete the complex task than the actual time. Although these studies are noteworthy for being among the first to introduce time estimations to task complexity research, there are three major problems that should be addressed. First, time should be estimated in terms of minutes and/or seconds for comparison with actual time by calculating the ratio of estimations to actual time, as in the present study. In cognitive psychology, this is a standard measure ‘so that all scores exist on the same relative scale’ (Brown 1985, p. 118). This enables the comparison of ratios within a single task-type, but also across different task-types. However, previous studies used subtractions instead of ratios, making it nearly impossible to make accurate comparisons across task-types and even within the same task-type. Second, asking participants to choose the task that they believe took longer to perform lead them to think there may be cognitive differences between tasks when in actuality they may not believe so. Third, because time estimations are subjective and may even vary largely within participants, it is important to subject all participants to this method, instead of comparing time estimations between groups that performed one version of the same task-type. Dual-task performance and brain activity measures are claimed to be objective, direct methods for measuring cognitive load (Brunken et al. 2003). This paradigm assumes that simultaneous performance of two tasks (primary and secondary) will have an impact on the distribution of attentional resources. The underlying principle is that ‘performance on the secondary task, assessed in terms of reaction time and accuracy, mirrors the level of cognitive load generated by the primary task’ (Révész 2014, p. 90). When there are different versions of a primary task that require attentional resources, performance in processing a secondary task will vary according to the cognitive load induced by the primary task. Three studies of particular interest that have investigated the validity of task complexity manipulations were those of Révész et al. (2014, 2015), and Sasayama (2016). In the first study, the cognitive load of two computer-delivered tasks, whose complexity was increased along the ± causal reasoning dimension, was measured by means of expert judgments, dual-task methodology, and eye-tracking. Two doctoral students in applied linguistics were asked to judge all 32 experimental items, with results showing that the versions intended to be more complex were rated as such. While 16 native speakers of English and 16 ESL learners performed the primary task of choosing a correct past event and orally producing a past counterfactual statement, the color of the computer screen changed to red or green at random intervals. Participants had to respond to these changes as quickly and accurately as possible. Although reaction times on the secondary task did not differ significantly between simple and complex task versions, accuracy rates were found to be a sufficiently sensitive measure of cognitive load. Participants achieved higher accuracy rates when performing the simple versions, and native speakers achieved higher accuracy rates than the ESL learners. Eye-tracking also provided support for the validity of task complexity manipulations in terms of fixation counts and fixation duration. It was also found that ESL learners showed longer fixation durations than native speakers, but not higher counts. Révész et al. (2015) attempted to validate task complexity using the dual-task method, participant self-ratings, and expert judgments. Forty-eight English native speakers and 48 ESL speakers performed three oral task-types: a picture narrative, an MT, and a decision-making task, each with a simple and complex version. The researchers adopted the dual-task method in Révész et al. (2014), with the secondary task requiring participants to respond to screen color changes. Participants also completed a perception questionnaire regarding the mental effort required by the task and overall task difficulty. Sixty-one ESL teachers also provided their expert judgments by answering the perception questionnaire and explaining the reasons behind their answers. The dual-task method was found to be a good measure of cognitive load, with participants’ accuracy on the secondary task being higher on the simple task version than the complex version. However, task complexity effects were not found for reaction time. Both ESL learners’ and teachers’ self-rated perceptions of mental effort and task difficulty provided further support for the validity of task complexity manipulations, with ratings being higher for complex task versions. In short, complex task versions placed greater cognitive load on participants than simple versions. In Sasayama’s (2016) study, the dual-task method, time estimations, and self-ratings of task difficulty and mental effort were employed. Fifty-three adult Japanese learners of English, divided into three groups according to their L2 proficiency, participated in four narrative tasks. The number of elements determined task complexity, with each story involving one, two, four, or nine characters (named Tasks 1, 2, 3, and 4, respectively). For the secondary task, participants responded to letter-color changes. Although differences were not significant, it was found that reaction times on the secondary task for Task 4 were longer than those for Task 1, and those for Task 2 were longer than those for Task 3. Results of participants’ time estimations showed that they perceived Task 4 to be more complex than Task 1. However, time estimations for Task 2 were shorter than those for Task 3, suggesting that the former was perceived to be more complex than the latter that involved more characters in the story. With regard to self-assessments, Task 4 was found to be significantly more difficult and to require more mental effort than the other three tasks, and Task 2 was perceived to be more complex than Task 1. Different response patterns were found when comparing high- and low-proficiency groups, suggesting an interaction effect of L2 proficiency, task complexity, and measure of cognitive load. To explain why Task 2 placed greater cognitive demands than Task 3, two possibilities were suggested: storyline and picture quality, and code complexity. The three studies above used a combination of methods to see whether the intended complex tasks were actually cognitively complex. In general, evidence from these measures showed that complex tasks imposed greater cognitive load on the learner. Among the studies, only Révész et al. (2014) further examined learning outcomes, and hypothesized that those who received recasts while performing complex tasks (+reasoning demands) would show greater gains in L2 development than those who received recasts during simple tasks (−reasoning demands). However, counter to the authors’ predictions, there was no significant difference between the two groups in terms of written production, and the simple task group outperformed the complex task group in terms of oral production. Moreover, Révész et al. (2014) and Sasayama (2016) employed only one task-type, which raises questions about the generalizability of the results. Bearing these points in mind, the present study employed a combination of measures to validate task complexity manipulations of three task-types, and investigated the effects of task complexity on native speaker oral production in terms of syntactic complexity, lexical diversity, and lexical sophistication. THE PRESENT STUDY In light of the limitations of previous research, the present work sought to answer four research questions. RQ1. Do task complexity manipulations along the ± number of elements dimension lead to systematic changes in self-ratings of cognitive load? RQ2. Do task complexity manipulations along the ± number of elements dimension lead to systematic changes in duration judgments on tasks? RQ3. Do task complexity manipulations along the ± number of elements dimension lead to systematic changes in dual-task measures? RQ4. Do task complexity manipulations along the ± number of elements dimension lead to systematic changes in native speakers’ oral production? METHOD Participants Forty-two native speakers of English (18 males, 24 females) enrolled at a university in the USA participated in the study. Their ages ranged from 19 to 41 years at the time of study (M = 26.14, SD = 4.646). Tasks Three types of oral tasks were employed to maximize generalizability of findings and avoid type–token confounds. A resource-directing dimension of the CH, ± few elements, was manipulated such that each task-type had three versions of task complexity: least complex, mid-complex, and most-complex. In a MT, the learner had to find the quickest route from one place to another and tell an imaginary friend how to drive there. The number of obstacles on the road (e.g. a no-turn sign, one-way street, closed road, construction site, etc.) was manipulated so that an increasing number of obstacles forced participants to find a more complex route. In a Seating Arrangement task (SAT), the participant had to arrange the best seating plan for a number of people with certain preferences. It was assumed that the greater number of people and preferences would increase task complexity. In a Car Accident task (CAT), the learner watched a video clip of a car accident scene three times and reported the accident by pretending to be a news reporter. The number of cars and people involved in the accident determined the complexity of the task. As much as 2 min were provided as planning time for the MT and CAT, and as much as 5 min for the SAT. Table 1 illustrates in detail the number of elements involved in each task-type and version. Measures of cognitive load After performing each version of a task, participants completed a questionnaire in which nine-point Likert scales were used to answer questions about (i) overall perceived task difficulty, (ii) level of mental effort they thought was required for task performance, and (iii) level of stress they felt during task performance. They were also asked to estimate the time it took for planning and performing the task separately. Although it was assumed that both times would be affected by the complexity of the task, precaution of separating the two was taken to prevent participants’ awareness of preparation time from confounding the results. These time estimations were later used to calculate the ratio of subjective duration to objective duration as one measure of cognitive load. It was in the prospective paradigm, as participants were aware prior to, or immediately upon, performance that they needed to make a duration judgment. As part of the dual-task method, participants performed a secondary simple choice reaction task while simultaneously performing the primary oral task. Screen color changes were employed to ensure that differences in the performance of the secondary task were a reflection of changes at a cognitive level, and not a perceptual level (Cierniak et al. 2009). While participants were performing the primary task, the laptop screen before them changed colors from either white to green or white to red at intervals of 2,500 ms. Participants were required to react as quickly and accurately as possible to the color changes, pressing the left shift key if the screen changed from white to green, and the right shift key if it changed from white to red. The primary and secondary tasks were run through DMDX, and participants’ error rates and reaction times were recorded. Accuracy was calculated by dividing the number of correct responses to color changes by the total number of changes. Only correct responses were considered for reaction time. Procedure Participants met with the researcher individually for a 1-h session. They first completed a language background questionnaire (adapted from Ellis 2011), and then performed a series of practice items whose format was identical to the tasks in the test phase. A sample item of each task-type was provided. To prevent sequencing effects, the order of the tasks was pseudo-randomized such that three blocks containing one version of each task-type were scrambled (see Table 2). While performing the primary oral tasks, participants responded to screen color changes as a secondary task. Following each task version, they completed a questionnaire regarding cognitive load self-ratings and estimated the time they had spent on planning and speech during task performance. Table 1: Number of task elements Task-type Type of element Least complex Mid-complex Most complex MT Obstacles 0 2 4 SAT Guests 4 6 8 CAT Cars and people 1 3 10 Task-type Type of element Least complex Mid-complex Most complex MT Obstacles 0 2 4 SAT Guests 4 6 8 CAT Cars and people 1 3 10 Table 1: Number of task elements Task-type Type of element Least complex Mid-complex Most complex MT Obstacles 0 2 4 SAT Guests 4 6 8 CAT Cars and people 1 3 10 Task-type Type of element Least complex Mid-complex Most complex MT Obstacles 0 2 4 SAT Guests 4 6 8 CAT Cars and people 1 3 10 Table 2: Sample of task randomization Participant Task sequence 1st 2nd 3rd 4th 5th 6th 7th 8th 9th 1 CAT 2 SAT 2 MT 2 SAT 1 MT 1 CAT 3 MT 3 SAT 3 CAT 1 2 SAT 2 MT 2 CAT 1 SAT 1 MT 3 CAT 3 MT 1 CAT 2 SAT 3 3 MT 3 SAT 1 CAT 2 MT 1 CAT 3 SAT 2 MT 2 CAT 1 SAT 3 4 MT 1 SAT 2 CAT 1 SAT 3 CAT 2 MT 2 SAT 1 CAT 3 MT 3 5 MT 3 SAT 3 CAT 3 MT 2 CAT 2 SAT 1 MT 1 CAT 1 SAT 2 6 CAT 1 MT 3 SAT 2 CAT 3 SAT 1 MT 1 CAT 2 MT 2 SAT 3 Participant Task sequence 1st 2nd 3rd 4th 5th 6th 7th 8th 9th 1 CAT 2 SAT 2 MT 2 SAT 1 MT 1 CAT 3 MT 3 SAT 3 CAT 1 2 SAT 2 MT 2 CAT 1 SAT 1 MT 3 CAT 3 MT 1 CAT 2 SAT 3 3 MT 3 SAT 1 CAT 2 MT 1 CAT 3 SAT 2 MT 2 CAT 1 SAT 3 4 MT 1 SAT 2 CAT 1 SAT 3 CAT 2 MT 2 SAT 1 CAT 3 MT 3 5 MT 3 SAT 3 CAT 3 MT 2 CAT 2 SAT 1 MT 1 CAT 1 SAT 2 6 CAT 1 MT 3 SAT 2 CAT 3 SAT 1 MT 1 CAT 2 MT 2 SAT 3 Note: 1 = least complex; 2 = mid-complex; 3 = most complex. Table 2: Sample of task randomization Participant Task sequence 1st 2nd 3rd 4th 5th 6th 7th 8th 9th 1 CAT 2 SAT 2 MT 2 SAT 1 MT 1 CAT 3 MT 3 SAT 3 CAT 1 2 SAT 2 MT 2 CAT 1 SAT 1 MT 3 CAT 3 MT 1 CAT 2 SAT 3 3 MT 3 SAT 1 CAT 2 MT 1 CAT 3 SAT 2 MT 2 CAT 1 SAT 3 4 MT 1 SAT 2 CAT 1 SAT 3 CAT 2 MT 2 SAT 1 CAT 3 MT 3 5 MT 3 SAT 3 CAT 3 MT 2 CAT 2 SAT 1 MT 1 CAT 1 SAT 2 6 CAT 1 MT 3 SAT 2 CAT 3 SAT 1 MT 1 CAT 2 MT 2 SAT 3 Participant Task sequence 1st 2nd 3rd 4th 5th 6th 7th 8th 9th 1 CAT 2 SAT 2 MT 2 SAT 1 MT 1 CAT 3 MT 3 SAT 3 CAT 1 2 SAT 2 MT 2 CAT 1 SAT 1 MT 3 CAT 3 MT 1 CAT 2 SAT 3 3 MT 3 SAT 1 CAT 2 MT 1 CAT 3 SAT 2 MT 2 CAT 1 SAT 3 4 MT 1 SAT 2 CAT 1 SAT 3 CAT 2 MT 2 SAT 1 CAT 3 MT 3 5 MT 3 SAT 3 CAT 3 MT 2 CAT 2 SAT 1 MT 1 CAT 1 SAT 2 6 CAT 1 MT 3 SAT 2 CAT 3 SAT 1 MT 1 CAT 2 MT 2 SAT 3 Note: 1 = least complex; 2 = mid-complex; 3 = most complex. Linguistic outcome measures Participants’ speech production was assessed in terms of syntactic complexity, lexical diversity, and lexical sophistication. Number of clauses per AS-unit, number of subordinate clauses per AS-unit, and mean length of AS-unit (number of words per AS-unit) were used to measure syntactic complexity. Lexical diversity was measured using Guiraud’s (1954) Index of Richness (a mathematical transformation of the type–token ratio that takes text length into consideration), MTLD (a Measure of Textual Lexical Diversity), and VOCD (also known as the D-measure). To assess lexical sophistication, the proportion of academic words in speech was analyzed by investigating frequency bands with the most common 1,000 words (K1), the next common 1,000 words (K2), the academic words of English (the AWL, 550 frequent words in academic texts, Coxhead 2000), and off-list words (the remainder not found on other lists). Two raters independently scored the entirety of the spoken data. Inter-rater reliability (Krippendorff's alpha) was .874 for number of clauses per AS-unit, .708 for number of subordinate clauses per AS-unit, and .885 for mean length of AS-unit, indicating good agreement between the raters. Discrepancies were later reviewed, reconciled, and recoded. The number of types and tokens were counted using Wordsmith, a lexical analysis software. The researcher subsequently used Guiraud’s Index to calculate one measure of lexical diversity. MTLD and VOCD were calculated by using a Web-based software tool called Coh-Metrix (McNamara et al. 2013). VocabProfile (Cobb 2002), a Web-based software tool that performs lexical text analysis, was used to analyze vocabulary frequency bands. Items that were outliers, defined as three SDs from the mean (3 from self-ratings, 10 from duration judgments, 17 from dual-task method results, and 11 from linguistic outcome measures), were detected and excluded from analyses. RESULTS Self-ratings of cognitive load Table 3 shows the descriptive statistics for self-ratings of perceived task difficulty, mental effort, and stress. Regardless of task-type, participants felt that the more complex tasks were more difficult, required more mental effort, and were more stressful. Figure 1 provides visual information of the changes in self-ratings. Table 3: Mean and SD of self-ratings Task-type MT SAT CAT Complexity 1 2 3 1 2 3 1 2 3 Perceived difficulty 3.71 (1.38) 5.02 (1.73) 6.24 (1.49) 4.57 (1.88) 5.88 (1.38) 7.95 (1.09) 4.38 (1.55) 5.17 (1.58) 6.86 (1.37) Mental effort 3.67 (1.52) 5.05 (1.90) 5.86 (1.62) 4.50 (1.81) 5.79 (1.52) 7.75 (1.03) 4.33 (1.68) 5.31 (1.81) 6.62 (1.45) Stress 3.43 (1.38) 4.57 (1.82) 4.88 (1.95) 4.02 (1.98) 4.67 (1.72) 6.55 (1.89) 3.71 (1.40) 4.48 (1.70) 5.40 (1.80) Task-type MT SAT CAT Complexity 1 2 3 1 2 3 1 2 3 Perceived difficulty 3.71 (1.38) 5.02 (1.73) 6.24 (1.49) 4.57 (1.88) 5.88 (1.38) 7.95 (1.09) 4.38 (1.55) 5.17 (1.58) 6.86 (1.37) Mental effort 3.67 (1.52) 5.05 (1.90) 5.86 (1.62) 4.50 (1.81) 5.79 (1.52) 7.75 (1.03) 4.33 (1.68) 5.31 (1.81) 6.62 (1.45) Stress 3.43 (1.38) 4.57 (1.82) 4.88 (1.95) 4.02 (1.98) 4.67 (1.72) 6.55 (1.89) 3.71 (1.40) 4.48 (1.70) 5.40 (1.80) Note: Complexity 1 indicates least complex, 2 mid-complex, and 3 most complex. Table 3: Mean and SD of self-ratings Task-type MT SAT CAT Complexity 1 2 3 1 2 3 1 2 3 Perceived difficulty 3.71 (1.38) 5.02 (1.73) 6.24 (1.49) 4.57 (1.88) 5.88 (1.38) 7.95 (1.09) 4.38 (1.55) 5.17 (1.58) 6.86 (1.37) Mental effort 3.67 (1.52) 5.05 (1.90) 5.86 (1.62) 4.50 (1.81) 5.79 (1.52) 7.75 (1.03) 4.33 (1.68) 5.31 (1.81) 6.62 (1.45) Stress 3.43 (1.38) 4.57 (1.82) 4.88 (1.95) 4.02 (1.98) 4.67 (1.72) 6.55 (1.89) 3.71 (1.40) 4.48 (1.70) 5.40 (1.80) Task-type MT SAT CAT Complexity 1 2 3 1 2 3 1 2 3 Perceived difficulty 3.71 (1.38) 5.02 (1.73) 6.24 (1.49) 4.57 (1.88) 5.88 (1.38) 7.95 (1.09) 4.38 (1.55) 5.17 (1.58) 6.86 (1.37) Mental effort 3.67 (1.52) 5.05 (1.90) 5.86 (1.62) 4.50 (1.81) 5.79 (1.52) 7.75 (1.03) 4.33 (1.68) 5.31 (1.81) 6.62 (1.45) Stress 3.43 (1.38) 4.57 (1.82) 4.88 (1.95) 4.02 (1.98) 4.67 (1.72) 6.55 (1.89) 3.71 (1.40) 4.48 (1.70) 5.40 (1.80) Note: Complexity 1 indicates least complex, 2 mid-complex, and 3 most complex. Figure 1: View largeDownload slide Task complexity increases and self-ratings Figure 1: View largeDownload slide Task complexity increases and self-ratings To answer the first research question, three linear mixed models were run on the self-ratings of cognitive load. The fixed-effects variables were task-type and task complexity, and the random-effects variables were participants and task items. To filter out unnecessarily complicated models, the model with the lowest Schwarz’s Bayesian information criterion (BIC) was chosen out of the whole set of candidates. As a result, the Compound Symmetry model was ultimately selected for all three models. Restricted maximum likelihood estimates were used. In the case of perceived difficulty, significant main effects were found for task-type and task complexity, F(2, 326.774) = 24.119, p < .001 and F(2, 326.774) = 146.665, p < .001, respectively. However, their interaction was not significant, F(4, 326.773) = 2.056, p = .086. Marginal and conditional R2s for the fixed effects were found to be R2GLMMm= .33 and R2GLMMc= .58. Pairwise comparisons showed that when task complexity was factored out, the SAT was considered significantly more difficult than the MT and CAT, and the CAT was perceived to be significantly more difficult than the MT. When task-type was ignored, there were significant differences between each level of complexity. In other words, the most complex task versions were perceived to be the most difficult, followed by the mid-complex, and then the least complex versions. When a mixed model was conducted with mental effort as the outcome variable, a significant interaction between task-type and task complexity was obtained, F(4, 325.762) = 2.414, p = .049, indicating that task complexity had different effects on mental effort depending on task-type. A simple effects test was then conducted, and results showed that the most complex versions of the SAT and the CAT significantly required the most mental effort, followed by the mid-complex versions and then the least complex versions (for the SAT, p < .001 for all comparisons; for the CAT, p < .001 for comparisons between the most complex versions and the other two versions, and p = .003 for the comparison between the mid-complex and least complex versions). In the case of the MT, the least complex version was considered to require significantly less mental effort than the other two more complex versions (p < .001 for the two comparisons). Significant main effects for task-type and task complexity were also found, F(2, 325.763) = 22.561, p < .001 and F(2, 325.763) = 115.517, p < .001, respectively. Pairwise comparisons revealed significant differences between each level of task-type and task complexity. Again, the greatest level of mental effort was required for the most complex task versions and the SAT. The CAT required more effort than the MT, and participants felt they expended more mental effort on the mid-complex task versions than the least complex versions. It was found that marginal and conditional R2s for the fixed effects were R2GLMMm= .28 and R2GLMMc= .57. A mixed model run on stress ratings obtained significant main effects for task-type and task complexity, as well as a significant interaction, F(2, 328.00) = 12.843, p < .001; F(2, 328.00) = 70.812, p < .001; and F(3, 328.00) = 4.445, p < .005, respectively. Marginal and conditional R2s for the fixed effects were found to be R2GLMMm= .15 and R2GLMMc= .60. Because there was a significant interaction effect, a simple effects test was conducted. It was found that the most and mid-complex versions of the MT were significantly more stressful than the least complex version (p < .001 for both comparisons). In the case of the SAT, the most complex version was significantly more stressful than the other two less complex versions (p < .001 for both comparisons). The most complex CAT version was found to be significantly more stressful than the mid-complex (p = .001) and least complex version (p < .001), and the mid-complex version was significantly more stressful than the least complex version (p < .001). When task complexity was ignored, the SAT was significantly more stressful than the MT and CAT. When task-type was factored out, significant differences were found between each level of task complexity, with stress ratings increasing as task complexity increased. Prospective duration judgments Table 4 and Figure 2 display the descriptive statistics and patterns for planning and speech duration judgment ratios. Planning ratios for the CAT showed a very different pattern from those of the MT and SAT, which seemed to be consistent across all task complexity levels. Speech ratios for the CAT were also slightly different, but there appears to be a general pattern in that speech duration judgment ratios decreased as task complexity increased. Table 4: Mean and SD of duration judgment ratios Task-type MT SAT CAT Complexity 1 2 3 1 2 3 1 2 3 Planning (s) 1.25 (0.86) 1.29 (0.70) 1.18 (0.57) 1.06 (0.49) 1.09 (0.44) 0.98 (0.21) 7.47 (8.79) 8.51 (9.44) 10.31 (10.73) Speech (s) 1.59 (0.89) 1.72 (0.95) 1.37 (0.82) 1.52 (0.91) 1.53 (0.79) 1.26 (0.63) 1.86 (1.23) 1.64 (0.94) 1.58 (0.96) Task-type MT SAT CAT Complexity 1 2 3 1 2 3 1 2 3 Planning (s) 1.25 (0.86) 1.29 (0.70) 1.18 (0.57) 1.06 (0.49) 1.09 (0.44) 0.98 (0.21) 7.47 (8.79) 8.51 (9.44) 10.31 (10.73) Speech (s) 1.59 (0.89) 1.72 (0.95) 1.37 (0.82) 1.52 (0.91) 1.53 (0.79) 1.26 (0.63) 1.86 (1.23) 1.64 (0.94) 1.58 (0.96) Note: Complexity 1 indicates least complex, 2 mid-complex, and 3 most complex. Table 4: Mean and SD of duration judgment ratios Task-type MT SAT CAT Complexity 1 2 3 1 2 3 1 2 3 Planning (s) 1.25 (0.86) 1.29 (0.70) 1.18 (0.57) 1.06 (0.49) 1.09 (0.44) 0.98 (0.21) 7.47 (8.79) 8.51 (9.44) 10.31 (10.73) Speech (s) 1.59 (0.89) 1.72 (0.95) 1.37 (0.82) 1.52 (0.91) 1.53 (0.79) 1.26 (0.63) 1.86 (1.23) 1.64 (0.94) 1.58 (0.96) Task-type MT SAT CAT Complexity 1 2 3 1 2 3 1 2 3 Planning (s) 1.25 (0.86) 1.29 (0.70) 1.18 (0.57) 1.06 (0.49) 1.09 (0.44) 0.98 (0.21) 7.47 (8.79) 8.51 (9.44) 10.31 (10.73) Speech (s) 1.59 (0.89) 1.72 (0.95) 1.37 (0.82) 1.52 (0.91) 1.53 (0.79) 1.26 (0.63) 1.86 (1.23) 1.64 (0.94) 1.58 (0.96) Note: Complexity 1 indicates least complex, 2 mid-complex, and 3 most complex. Figure 2: View largeDownload slide Task complexity increases and duration judgment ratios Figure 2: View largeDownload slide Task complexity increases and duration judgment ratios Two separate linear mixed models were conducted, with the Heterogeneous Compound Symmetry model chosen for planning duration judgment ratio, and the Compound Symmetry model selected for speech duration judgment ratio. Again, restricted maximum likelihood estimates were used. With respect to planning, a significant main effect was found for task-type, F(2, 84.981) = 27.505, p < .001. However, the main effect for task complexity and the interaction between the two were not significant, F(2, 77.657) = 1.107, p = .336 and F(4, 98.832) = .638, p = .637, respectively. Marginal and conditional R2s for the fixed effects were R2GLMMm= .21 and R2GLMMc= .38. Pairwise comparisons revealed that the CAT produced a significantly higher ratio than the other two task-types, with the ratio for the MT being significantly higher than that for the SAT. When the speech duration judgment ratio was the outcome variable, significant main effects were found for task-type and task complexity, F(2, 305.487) = 5.158, p < .01 and F(2, 305.487) = 8.600, p < .001, respectively. However, their interaction was non-significant, F(4, 305.624) = 1.314, p = .265. Marginal and conditional R2s for the fixed effects were found to be R2GLMMm= .02 and R2GLMMc= .64. Pairwise comparisons revealed that when task complexity was ignored, the speech ratio for the CAT was significantly higher than that for the SAT. When task-type was factored out, the speech ratio for the most complex task versions was significantly lower than those for the mid-complex and least complex versions. Dual-task outcome measures Descriptive statistics for dual-task outcome measures are displayed in Table 5. Participants’ accuracy seemed to be consistent across all task-types and versions. As illustrated in Figure 3, reaction times appear to have increased with greater task complexity, with the exception of the CAT. Table 5: Mean and standard deviation of dual-task outcome measures Task-type MT SAT CAT Complexity 1 2 3 1 2 3 1 2 3 Accuracy 0.97 (0.07) 0.96 (0.06) 0.96 (0.06) 0.97 (0.08) 0.96 (0.06) 0.95 (0.06) 0.97 (0.07) 0.98 (0.04) 0.95 (0.07) Reaction time (ms) 1,590.27 (921.10) 1,725.23 (839.97) 1,756.67 (761.97) 1,590.92 (941.28) 1,742.63 (720.68) 2,428.02 (1,514.83) 1,385.98 (652.15) 1,185.54 (358.04) 1,448.66 (586.23) Task-type MT SAT CAT Complexity 1 2 3 1 2 3 1 2 3 Accuracy 0.97 (0.07) 0.96 (0.06) 0.96 (0.06) 0.97 (0.08) 0.96 (0.06) 0.95 (0.06) 0.97 (0.07) 0.98 (0.04) 0.95 (0.07) Reaction time (ms) 1,590.27 (921.10) 1,725.23 (839.97) 1,756.67 (761.97) 1,590.92 (941.28) 1,742.63 (720.68) 2,428.02 (1,514.83) 1,385.98 (652.15) 1,185.54 (358.04) 1,448.66 (586.23) Note: Complexity 1 indicates least complex, 2 mid-complex, and 3 most complex. Table 5: Mean and standard deviation of dual-task outcome measures Task-type MT SAT CAT Complexity 1 2 3 1 2 3 1 2 3 Accuracy 0.97 (0.07) 0.96 (0.06) 0.96 (0.06) 0.97 (0.08) 0.96 (0.06) 0.95 (0.06) 0.97 (0.07) 0.98 (0.04) 0.95 (0.07) Reaction time (ms) 1,590.27 (921.10) 1,725.23 (839.97) 1,756.67 (761.97) 1,590.92 (941.28) 1,742.63 (720.68) 2,428.02 (1,514.83) 1,385.98 (652.15) 1,185.54 (358.04) 1,448.66 (586.23) Task-type MT SAT CAT Complexity 1 2 3 1 2 3 1 2 3 Accuracy 0.97 (0.07) 0.96 (0.06) 0.96 (0.06) 0.97 (0.08) 0.96 (0.06) 0.95 (0.06) 0.97 (0.07) 0.98 (0.04) 0.95 (0.07) Reaction time (ms) 1,590.27 (921.10) 1,725.23 (839.97) 1,756.67 (761.97) 1,590.92 (941.28) 1,742.63 (720.68) 2,428.02 (1,514.83) 1,385.98 (652.15) 1,185.54 (358.04) 1,448.66 (586.23) Note: Complexity 1 indicates least complex, 2 mid-complex, and 3 most complex. Figure 3: View largeDownload slide Task complexity increases and dual-task outcomes Figure 3: View largeDownload slide Task complexity increases and dual-task outcomes To see whether task-type and task complexity had an effect on the secondary task outcomes, two separate linear mixed models were conducted with participants and task items as random effects. Using the Schwarz’s BIC, the Compound Symmetry model was ultimately selected for accuracy and the Heterogeneous version of the Compound Symmetry model was chosen for reaction time. Restricted maximum likelihood estimates were utilized. In the case of accuracy, main effects for task complexity, task-type, and their interaction were not significant, F(2, 304.083) = 2.995, p = .051; F(2, 304.083) = .482, p = .618; and F(4, 304.240) = .714, p = .583, respectively. Marginal and conditional R2s for the fixed effects were R2GLMMm= .01 and R2GLMMc= .14. On the other hand, for reaction time, significant effects were found for task-type, task complexity, and their interaction, F(2, 91.743) = 22.341, p < .001; F(2, 90.087) = 13.898, p < .001; and F(4, 125.718) = 4.452, p < .005, respectively. Marginal and conditional R2s for the fixed effects were found to be R2GLMMm= .04 and R2GLMMc= .67. Due to the significant interaction, a simple effects test was conducted. Results showed that the reaction time on the most complex MT was significantly longer than that on the least complex version (p = .013), and the reaction time on the most complex SAT was significantly longer than those on the mid-complex (p = .005) and least complex versions (p < .001). The reaction time on the mid-complex SAT version was also significantly longer than that on the least complex version (p = .027). Regarding the significant main effects, pairwise comparisons revealed that when ignoring task complexity, reaction times were significantly longer on the SAT, followed by the MT and then the CAT. When task-type was factored out, reaction times were significantly longer on the most complex task versions than the two less complex versions. Linguistic outcome measures Descriptive statistics for the six linguistic outcome measures are presented in Table 6. Table 7 displays descriptive statistics for the measure of lexical sophistication. As depicted in Figure 4, there seemed to be a general reverse V-shaped pattern for syntactic complexity outcome measures when task complexity is increased. In other words, the mid-complex task versions seem to have generated the most complex structures. On the other hand, Figure 5 shows that lexical diversity seemed to have increased with greater task complexity. Table 6: Mean and standard deviation of linguistic outcome measures Task-type MT SAT CAT Complexity 1 2 3 1 2 3 1 2 3 Clause per AS-unit 1.80 (0.74) 1.72 (0.43) 1.75 (0.62) 2.65 (0.72) 3.01 (0.86) 2.61 (0.92) 2.31 (0.78) 2.47 (0.86) 2.31 (0.73) Subordinate clause per AS-unit 0.45 (0.50) 0.43 (0.34) 0.47 (0.44) 1.43 (0.61) 1.72 (0.76) 1.41 (0.74) 0.92 (0.63) 1.18 (0.73) 0.89 (0.50) Mean length of AS-unit 14.52 14.28 14.46 21.45 21.20 19.50 22.41 19.58 20.42 (5.42) (4.72) (6.04) (5.52) (6.05) (4.75) (10.29) (6.23) (7.29) Guiraud’s Index 4.72 4.71 4.70 5.25 5.17 5.56 5.35 5.71 5.79 (0.50) (0.47) (0.60) (0.65) (0.79) (0.85) (0.84) (0.71) (0.70) MTLD 32.67 32.14 33.53 39.67 35.85 40.78 48.92 48.10 47.65 (0.95) (0.85) (1.06) (1.32) (1.50) (1.71) (2.49) (2.02) (2.56) VOCD 7.14 16.20 19.92 21.95 24.46 37.51 10.12 16.15 26.57 (1.92) (2.00) (2.09) (3.13) (2.48) (2.04) (3.33) (3.78) (3.86) Task-type MT SAT CAT Complexity 1 2 3 1 2 3 1 2 3 Clause per AS-unit 1.80 (0.74) 1.72 (0.43) 1.75 (0.62) 2.65 (0.72) 3.01 (0.86) 2.61 (0.92) 2.31 (0.78) 2.47 (0.86) 2.31 (0.73) Subordinate clause per AS-unit 0.45 (0.50) 0.43 (0.34) 0.47 (0.44) 1.43 (0.61) 1.72 (0.76) 1.41 (0.74) 0.92 (0.63) 1.18 (0.73) 0.89 (0.50) Mean length of AS-unit 14.52 14.28 14.46 21.45 21.20 19.50 22.41 19.58 20.42 (5.42) (4.72) (6.04) (5.52) (6.05) (4.75) (10.29) (6.23) (7.29) Guiraud’s Index 4.72 4.71 4.70 5.25 5.17 5.56 5.35 5.71 5.79 (0.50) (0.47) (0.60) (0.65) (0.79) (0.85) (0.84) (0.71) (0.70) MTLD 32.67 32.14 33.53 39.67 35.85 40.78 48.92 48.10 47.65 (0.95) (0.85) (1.06) (1.32) (1.50) (1.71) (2.49) (2.02) (2.56) VOCD 7.14 16.20 19.92 21.95 24.46 37.51 10.12 16.15 26.57 (1.92) (2.00) (2.09) (3.13) (2.48) (2.04) (3.33) (3.78) (3.86) Note: Complexity 1 indicates least complex, 2 mid-complex, and 3 most complex. Table 6: Mean and standard deviation of linguistic outcome measures Task-type MT SAT CAT Complexity 1 2 3 1 2 3 1 2 3 Clause per AS-unit 1.80 (0.74) 1.72 (0.43) 1.75 (0.62) 2.65 (0.72) 3.01 (0.86) 2.61 (0.92) 2.31 (0.78) 2.47 (0.86) 2.31 (0.73) Subordinate clause per AS-unit 0.45 (0.50) 0.43 (0.34) 0.47 (0.44) 1.43 (0.61) 1.72 (0.76) 1.41 (0.74) 0.92 (0.63) 1.18 (0.73) 0.89 (0.50) Mean length of AS-unit 14.52 14.28 14.46 21.45 21.20 19.50 22.41 19.58 20.42 (5.42) (4.72) (6.04) (5.52) (6.05) (4.75) (10.29) (6.23) (7.29) Guiraud’s Index 4.72 4.71 4.70 5.25 5.17 5.56 5.35 5.71 5.79 (0.50) (0.47) (0.60) (0.65) (0.79) (0.85) (0.84) (0.71) (0.70) MTLD 32.67 32.14 33.53 39.67 35.85 40.78 48.92 48.10 47.65 (0.95) (0.85) (1.06) (1.32) (1.50) (1.71) (2.49) (2.02) (2.56) VOCD 7.14 16.20 19.92 21.95 24.46 37.51 10.12 16.15 26.57 (1.92) (2.00) (2.09) (3.13) (2.48) (2.04) (3.33) (3.78) (3.86) Task-type MT SAT CAT Complexity 1 2 3 1 2 3 1 2 3 Clause per AS-unit 1.80 (0.74) 1.72 (0.43) 1.75 (0.62) 2.65 (0.72) 3.01 (0.86) 2.61 (0.92) 2.31 (0.78) 2.47 (0.86) 2.31 (0.73) Subordinate clause per AS-unit 0.45 (0.50) 0.43 (0.34) 0.47 (0.44) 1.43 (0.61) 1.72 (0.76) 1.41 (0.74) 0.92 (0.63) 1.18 (0.73) 0.89 (0.50) Mean length of AS-unit 14.52 14.28 14.46 21.45 21.20 19.50 22.41 19.58 20.42 (5.42) (4.72) (6.04) (5.52) (6.05) (4.75) (10.29) (6.23) (7.29) Guiraud’s Index 4.72 4.71 4.70 5.25 5.17 5.56 5.35 5.71 5.79 (0.50) (0.47) (0.60) (0.65) (0.79) (0.85) (0.84) (0.71) (0.70) MTLD 32.67 32.14 33.53 39.67 35.85 40.78 48.92 48.10 47.65 (0.95) (0.85) (1.06) (1.32) (1.50) (1.71) (2.49) (2.02) (2.56) VOCD 7.14 16.20 19.92 21.95 24.46 37.51 10.12 16.15 26.57 (1.92) (2.00) (2.09) (3.13) (2.48) (2.04) (3.33) (3.78) (3.86) Note: Complexity 1 indicates least complex, 2 mid-complex, and 3 most complex. Table 7: Average percentage and standard deviation of word bands Task-type MT SAT CAT Complexity 1 2 3 1 2 3 1 2 3 K1 words 79.25 (5.23) 81.88 (4.07) 82.88 (3.75) 91.01 (4.24) 84.11 (4.37) 88.18 (5.20) 81.19 (5.09) 76.92 (4.98) 77.86 (4.60) K2 words 7.94 (3.59) 8.02 (2.62) 6.29 (2.28) 4.88 (2.19) 10.28 (5.17) 3.10 (2.86) 4.78 (3.47) 8.24 (3.57) 8.22 (2.60) AWL words 0.42 0.23 0.52 0.48 0.68 1.00 1.00 1.40 1.41 (1.19) (0.47) (0.93) (0.83) (1.07) (1.15) (1.58) (2.08) (1.56) Off-list words 12.40 9.86 10.31 3.63 4.93 7.72 13.03 13.45 12.51 (3.60) (3.55) (3.29) (3.30) (3.81) (4.63) (4.83) (4.24) (3.54) Task-type MT SAT CAT Complexity 1 2 3 1 2 3 1 2 3 K1 words 79.25 (5.23) 81.88 (4.07) 82.88 (3.75) 91.01 (4.24) 84.11 (4.37) 88.18 (5.20) 81.19 (5.09) 76.92 (4.98) 77.86 (4.60) K2 words 7.94 (3.59) 8.02 (2.62) 6.29 (2.28) 4.88 (2.19) 10.28 (5.17) 3.10 (2.86) 4.78 (3.47) 8.24 (3.57) 8.22 (2.60) AWL words 0.42 0.23 0.52 0.48 0.68 1.00 1.00 1.40 1.41 (1.19) (0.47) (0.93) (0.83) (1.07) (1.15) (1.58) (2.08) (1.56) Off-list words 12.40 9.86 10.31 3.63 4.93 7.72 13.03 13.45 12.51 (3.60) (3.55) (3.29) (3.30) (3.81) (4.63) (4.83) (4.24) (3.54) Note: Complexity 1 indicates least complex, 2 mid-complex, and 3 most complex. Table 7: Average percentage and standard deviation of word bands Task-type MT SAT CAT Complexity 1 2 3 1 2 3 1 2 3 K1 words 79.25 (5.23) 81.88 (4.07) 82.88 (3.75) 91.01 (4.24) 84.11 (4.37) 88.18 (5.20) 81.19 (5.09) 76.92 (4.98) 77.86 (4.60) K2 words 7.94 (3.59) 8.02 (2.62) 6.29 (2.28) 4.88 (2.19) 10.28 (5.17) 3.10 (2.86) 4.78 (3.47) 8.24 (3.57) 8.22 (2.60) AWL words 0.42 0.23 0.52 0.48 0.68 1.00 1.00 1.40 1.41 (1.19) (0.47) (0.93) (0.83) (1.07) (1.15) (1.58) (2.08) (1.56) Off-list words 12.40 9.86 10.31 3.63 4.93 7.72 13.03 13.45 12.51 (3.60) (3.55) (3.29) (3.30) (3.81) (4.63) (4.83) (4.24) (3.54) Task-type MT SAT CAT Complexity 1 2 3 1 2 3 1 2 3 K1 words 79.25 (5.23) 81.88 (4.07) 82.88 (3.75) 91.01 (4.24) 84.11 (4.37) 88.18 (5.20) 81.19 (5.09) 76.92 (4.98) 77.86 (4.60) K2 words 7.94 (3.59) 8.02 (2.62) 6.29 (2.28) 4.88 (2.19) 10.28 (5.17) 3.10 (2.86) 4.78 (3.47) 8.24 (3.57) 8.22 (2.60) AWL words 0.42 0.23 0.52 0.48 0.68 1.00 1.00 1.40 1.41 (1.19) (0.47) (0.93) (0.83) (1.07) (1.15) (1.58) (2.08) (1.56) Off-list words 12.40 9.86 10.31 3.63 4.93 7.72 13.03 13.45 12.51 (3.60) (3.55) (3.29) (3.30) (3.81) (4.63) (4.83) (4.24) (3.54) Note: Complexity 1 indicates least complex, 2 mid-complex, and 3 most complex. Table 8: Examples of task simplification and proper task completion Task-type Example of task simplification Example of good task completion MT Turn right on Main Street. Drive two blocks. Turn left on Lincoln Avenue. Drive one block. Turn left on Downtown Street? Drive one block. Turn right on Jefferson Avenue? Drive one block? Then turn left on Pine Street? Drive one block? Then turn right on Washington Avenue. The post office will be on your right Okay so to get to the post office you're going to make a right, onto Main Street, which is going to be your first right, because there is road work up ahead. Um after making this right you're going to continue straight passing the pet store supermarket hospital and police station? Because you cannot make a left onto Jefferson Avenue, so once you're on the end of Main Street you're going to make that left onto Lincoln Avenue. And you're going to make your next left that you can, right onto Downtown Street. You will pass a shop, and then you will continue that block and make a right onto Jefferson Avenue? Uh you will see a cinema, if you're going in the correct direction? And you will continue straight until you approach a library, where you will then make a left? And you will see a pizza place and if you continue, you will see the post office on your right, and make that right, across from the playground, and you will find yourself right on Washington Avenue at the post office SAT Okay so head of State A will sit at the bottom right? Um head of State B will sit to the right of head of State A. Uh head of State G will sit right at the bathroom um and on the top right uh head of State E will sit. Um on the left side at the bottom left head of State H will sit at the bottom? Head of State D will sit to the left of head of State H. Um head of State C will sit to the left of head of State D and head of State C. And head of State F will sit to the left of head of State C Alright, so, I guess I will start with the closest to the restroom, which will be G because they aren't feeling well and wanna sit close to the restroom. And going around to the left, will be A, who is sitting here because D is on the other side of the room and they're at war with them and don't wanna sit near them. To the left of A, is um B, and they are sitting far enough away from E, where they won't bother each other because they're at war with each other. Uh to the left of B is H, who is going to be late and they wanna be able to sneak in. To the left of H is D, who is at war with A so they're far apart from each other. And D also does not like meat, they will be eating seafood. Uh to the left of D is C, who wanted to be seated next to uh someone they didn't wanna be seated next to a woman who aren't their wives, so they're between two men, which leaves to their left who is F, which is a man, and also wanted uh seafood and wanted to strengthen their terms with C so this works out in everyone's favor? And then lastly is E, to the left of them, who enjoy meat, and yep, and are at war with B. So that's why they wanna sit far away from them CAT Three vehicles are sliding and spinning down a steep snow-covered hill, a red car, a red SUV, and a white truck. The SUV collides with the white car on the side of the road, and comes to a stop. Moments later, the SUV is struck by a large red four by four, quite severely. The other two vehicles have escaped with minor damages In this scene, we see an icy road condition uh on a hill, which was obviously not good enough to- for some of these cars to drive. Now, we see three cars sliding down a icy road hill. Um there's a white pickup truck there's a black hatchback and another dark colored sedan. The- um the scene pans out and we see that the sedan hits a white car that is stopped, and we also begin to see four other cars on the scene. Immediately, and with accelerated pace, we then see a red SUV sprinting down the icy road hill. And- and that hits the black hatchback car that we see at the beginning of the screen. The footage shows the other cars that are stopped stationary, most likely unable to drive in these treacherous conditions. However at the same time we do see a Fedex truck that's able to cruise by. Um there's also a few people on the scene we assume that they are the drivers that are unable to uh stay in their car and they wanna stay safe, because of the cars that are sliding down um the icy road hill. Um meanwhile um these people are closing the doors and trying to see if their cars are okay, but these are just not conditions that we should be driving in Task-type Example of task simplification Example of good task completion MT Turn right on Main Street. Drive two blocks. Turn left on Lincoln Avenue. Drive one block. Turn left on Downtown Street? Drive one block. Turn right on Jefferson Avenue? Drive one block? Then turn left on Pine Street? Drive one block? Then turn right on Washington Avenue. The post office will be on your right Okay so to get to the post office you're going to make a right, onto Main Street, which is going to be your first right, because there is road work up ahead. Um after making this right you're going to continue straight passing the pet store supermarket hospital and police station? Because you cannot make a left onto Jefferson Avenue, so once you're on the end of Main Street you're going to make that left onto Lincoln Avenue. And you're going to make your next left that you can, right onto Downtown Street. You will pass a shop, and then you will continue that block and make a right onto Jefferson Avenue? Uh you will see a cinema, if you're going in the correct direction? And you will continue straight until you approach a library, where you will then make a left? And you will see a pizza place and if you continue, you will see the post office on your right, and make that right, across from the playground, and you will find yourself right on Washington Avenue at the post office SAT Okay so head of State A will sit at the bottom right? Um head of State B will sit to the right of head of State A. Uh head of State G will sit right at the bathroom um and on the top right uh head of State E will sit. Um on the left side at the bottom left head of State H will sit at the bottom? Head of State D will sit to the left of head of State H. Um head of State C will sit to the left of head of State D and head of State C. And head of State F will sit to the left of head of State C Alright, so, I guess I will start with the closest to the restroom, which will be G because they aren't feeling well and wanna sit close to the restroom. And going around to the left, will be A, who is sitting here because D is on the other side of the room and they're at war with them and don't wanna sit near them. To the left of A, is um B, and they are sitting far enough away from E, where they won't bother each other because they're at war with each other. Uh to the left of B is H, who is going to be late and they wanna be able to sneak in. To the left of H is D, who is at war with A so they're far apart from each other. And D also does not like meat, they will be eating seafood. Uh to the left of D is C, who wanted to be seated next to uh someone they didn't wanna be seated next to a woman who aren't their wives, so they're between two men, which leaves to their left who is F, which is a man, and also wanted uh seafood and wanted to strengthen their terms with C so this works out in everyone's favor? And then lastly is E, to the left of them, who enjoy meat, and yep, and are at war with B. So that's why they wanna sit far away from them CAT Three vehicles are sliding and spinning down a steep snow-covered hill, a red car, a red SUV, and a white truck. The SUV collides with the white car on the side of the road, and comes to a stop. Moments later, the SUV is struck by a large red four by four, quite severely. The other two vehicles have escaped with minor damages In this scene, we see an icy road condition uh on a hill, which was obviously not good enough to- for some of these cars to drive. Now, we see three cars sliding down a icy road hill. Um there's a white pickup truck there's a black hatchback and another dark colored sedan. The- um the scene pans out and we see that the sedan hits a white car that is stopped, and we also begin to see four other cars on the scene. Immediately, and with accelerated pace, we then see a red SUV sprinting down the icy road hill. And- and that hits the black hatchback car that we see at the beginning of the screen. The footage shows the other cars that are stopped stationary, most likely unable to drive in these treacherous conditions. However at the same time we do see a Fedex truck that's able to cruise by. Um there's also a few people on the scene we assume that they are the drivers that are unable to uh stay in their car and they wanna stay safe, because of the cars that are sliding down um the icy road hill. Um meanwhile um these people are closing the doors and trying to see if their cars are okay, but these are just not conditions that we should be driving in Note: ? indicates rising intonation, not necessarily a question. . indicates falling intonation, end of utterance. , indicates low-rising intonation that suggests continuation. - indicates false-start, self-correction, or self-interruption. Table 8: Examples of task simplification and proper task completion Task-type Example of task simplification Example of good task completion MT Turn right on Main Street. Drive two blocks. Turn left on Lincoln Avenue. Drive one block. Turn left on Downtown Street? Drive one block. Turn right on Jefferson Avenue? Drive one block? Then turn left on Pine Street? Drive one block? Then turn right on Washington Avenue. The post office will be on your right Okay so to get to the post office you're going to make a right, onto Main Street, which is going to be your first right, because there is road work up ahead. Um after making this right you're going to continue straight passing the pet store supermarket hospital and police station? Because you cannot make a left onto Jefferson Avenue, so once you're on the end of Main Street you're going to make that left onto Lincoln Avenue. And you're going to make your next left that you can, right onto Downtown Street. You will pass a shop, and then you will continue that block and make a right onto Jefferson Avenue? Uh you will see a cinema, if you're going in the correct direction? And you will continue straight until you approach a library, where you will then make a left? And you will see a pizza place and if you continue, you will see the post office on your right, and make that right, across from the playground, and you will find yourself right on Washington Avenue at the post office SAT Okay so head of State A will sit at the bottom right? Um head of State B will sit to the right of head of State A. Uh head of State G will sit right at the bathroom um and on the top right uh head of State E will sit. Um on the left side at the bottom left head of State H will sit at the bottom? Head of State D will sit to the left of head of State H. Um head of State C will sit to the left of head of State D and head of State C. And head of State F will sit to the left of head of State C Alright, so, I guess I will start with the closest to the restroom, which will be G because they aren't feeling well and wanna sit close to the restroom. And going around to the left, will be A, who is sitting here because D is on the other side of the room and they're at war with them and don't wanna sit near them. To the left of A, is um B, and they are sitting far enough away from E, where they won't bother each other because they're at war with each other. Uh to the left of B is H, who is going to be late and they wanna be able to sneak in. To the left of H is D, who is at war with A so they're far apart from each other. And D also does not like meat, they will be eating seafood. Uh to the left of D is C, who wanted to be seated next to uh someone they didn't wanna be seated next to a woman who aren't their wives, so they're between two men, which leaves to their left who is F, which is a man, and also wanted uh seafood and wanted to strengthen their terms with C so this works out in everyone's favor? And then lastly is E, to the left of them, who enjoy meat, and yep, and are at war with B. So that's why they wanna sit far away from them CAT Three vehicles are sliding and spinning down a steep snow-covered hill, a red car, a red SUV, and a white truck. The SUV collides with the white car on the side of the road, and comes to a stop. Moments later, the SUV is struck by a large red four by four, quite severely. The other two vehicles have escaped with minor damages In this scene, we see an icy road condition uh on a hill, which was obviously not good enough to- for some of these cars to drive. Now, we see three cars sliding down a icy road hill. Um there's a white pickup truck there's a black hatchback and another dark colored sedan. The- um the scene pans out and we see that the sedan hits a white car that is stopped, and we also begin to see four other cars on the scene. Immediately, and with accelerated pace, we then see a red SUV sprinting down the icy road hill. And- and that hits the black hatchback car that we see at the beginning of the screen. The footage shows the other cars that are stopped stationary, most likely unable to drive in these treacherous conditions. However at the same time we do see a Fedex truck that's able to cruise by. Um there's also a few people on the scene we assume that they are the drivers that are unable to uh stay in their car and they wanna stay safe, because of the cars that are sliding down um the icy road hill. Um meanwhile um these people are closing the doors and trying to see if their cars are okay, but these are just not conditions that we should be driving in Task-type Example of task simplification Example of good task completion MT Turn right on Main Street. Drive two blocks. Turn left on Lincoln Avenue. Drive one block. Turn left on Downtown Street? Drive one block. Turn right on Jefferson Avenue? Drive one block? Then turn left on Pine Street? Drive one block? Then turn right on Washington Avenue. The post office will be on your right Okay so to get to the post office you're going to make a right, onto Main Street, which is going to be your first right, because there is road work up ahead. Um after making this right you're going to continue straight passing the pet store supermarket hospital and police station? Because you cannot make a left onto Jefferson Avenue, so once you're on the end of Main Street you're going to make that left onto Lincoln Avenue. And you're going to make your next left that you can, right onto Downtown Street. You will pass a shop, and then you will continue that block and make a right onto Jefferson Avenue? Uh you will see a cinema, if you're going in the correct direction? And you will continue straight until you approach a library, where you will then make a left? And you will see a pizza place and if you continue, you will see the post office on your right, and make that right, across from the playground, and you will find yourself right on Washington Avenue at the post office SAT Okay so head of State A will sit at the bottom right? Um head of State B will sit to the right of head of State A. Uh head of State G will sit right at the bathroom um and on the top right uh head of State E will sit. Um on the left side at the bottom left head of State H will sit at the bottom? Head of State D will sit to the left of head of State H. Um head of State C will sit to the left of head of State D and head of State C. And head of State F will sit to the left of head of State C Alright, so, I guess I will start with the closest to the restroom, which will be G because they aren't feeling well and wanna sit close to the restroom. And going around to the left, will be A, who is sitting here because D is on the other side of the room and they're at war with them and don't wanna sit near them. To the left of A, is um B, and they are sitting far enough away from E, where they won't bother each other because they're at war with each other. Uh to the left of B is H, who is going to be late and they wanna be able to sneak in. To the left of H is D, who is at war with A so they're far apart from each other. And D also does not like meat, they will be eating seafood. Uh to the left of D is C, who wanted to be seated next to uh someone they didn't wanna be seated next to a woman who aren't their wives, so they're between two men, which leaves to their left who is F, which is a man, and also wanted uh seafood and wanted to strengthen their terms with C so this works out in everyone's favor? And then lastly is E, to the left of them, who enjoy meat, and yep, and are at war with B. So that's why they wanna sit far away from them CAT Three vehicles are sliding and spinning down a steep snow-covered hill, a red car, a red SUV, and a white truck. The SUV collides with the white car on the side of the road, and comes to a stop. Moments later, the SUV is struck by a large red four by four, quite severely. The other two vehicles have escaped with minor damages In this scene, we see an icy road condition uh on a hill, which was obviously not good enough to- for some of these cars to drive. Now, we see three cars sliding down a icy road hill. Um there's a white pickup truck there's a black hatchback and another dark colored sedan. The- um the scene pans out and we see that the sedan hits a white car that is stopped, and we also begin to see four other cars on the scene. Immediately, and with accelerated pace, we then see a red SUV sprinting down the icy road hill. And- and that hits the black hatchback car that we see at the beginning of the screen. The footage shows the other cars that are stopped stationary, most likely unable to drive in these treacherous conditions. However at the same time we do see a Fedex truck that's able to cruise by. Um there's also a few people on the scene we assume that they are the drivers that are unable to uh stay in their car and they wanna stay safe, because of the cars that are sliding down um the icy road hill. Um meanwhile um these people are closing the doors and trying to see if their cars are okay, but these are just not conditions that we should be driving in Note: ? indicates rising intonation, not necessarily a question. . indicates falling intonation, end of utterance. , indicates low-rising intonation that suggests continuation. - indicates false-start, self-correction, or self-interruption. Figure 4: View largeDownload slide Task complexity increases and syntactic complexity outcome measures Figure 4: View largeDownload slide Task complexity increases and syntactic complexity outcome measures Figure 5: View largeDownload slide Task complexity increases and lexical diversity outcome measures Figure 5: View largeDownload slide Task complexity increases and lexical diversity outcome measures Linear mixed models were run for each outcome measure, with task-type and task complexity as fixed effects, and participants and task items as random effects. Finding the smallest value of the Schwarz’s BIC, the Compound Symmetry model was ultimately chosen for the number of clauses per AS-unit and Guiraud’s Index, and the Heterogeneous version of the Compound Symmetry model for the number of subordinate clauses per AS-unit, mean length of AS-unit, MTLD, and VOCD. Restricted maximum likelihood estimates were utilized. Results of a mixed model conducted on number of clauses per AS-unit revealed a significant main effect for task-type, F(2, 310.974) = 75.060, p < .001. The main effect for task complexity and its interaction with task-type were not significant, F(2, 310.974) = 2.711, p = .068 and F(4, 310.973) = 1.542, p = .190, respectively. It was found that marginal and conditional R2s for the fixed effects were R2GLMMm= .08 and R2GLMMc= .51. The SAT elicited significantly more complex structures than the other two task-types, and the CAT elicited significantly more complex structures than the MT. In the case of the number of subordinate clauses per AS-unit, significant main effects were found for task-type and task complexity, F(2, 190.964) = 124.212, p < .001 and F(2, 207.378) = 4.10, p < .05, respectively. However, their interaction was not significant, F(4, 164.507) = 2.407, p = .052. Marginal and conditional R2s for the fixed effects were found to be R2GLMMm= .07 and R2GLMMc= .60. Ignoring task complexity, the SAT produced significantly more complex structures than the other two task-types, and the CAT generated significantly more complex structures than the MT. When task-type was factored out, the mid-complex task versions elicited significantly more complex structures than the most and least complex versions. When a mixed model was run on mean length of AS-unit, a significant main effect for task-type was found, F(2, 140.08) = 68.211, p < .001, indicating that the SAT and the CAT elicited significantly more words per AS-unit than the MT. However, the main effect for task complexity and the task-type*task complexity interaction were not significant, F(2, 222.065) = 1.626, p = .199 and F(4, 146.918) = .991, p = .414, respectively. Marginal and conditional R2s for the fixed effects were found to be R2GLMMm= .14 and R2GLMMc= .42. Results of a mixed model conducted on lexical diversity in terms of Guiraud’s Index showed significant effects for task-type, task complexity, and their interaction, F(2, 312.00) = 80.092, p < .001; F(2, 312.00) = 5.823, p < .005; and F(4, 312.00) = 3.272, p < .05, respectively. Marginal and conditional R2s for the fixed effects were R2GLMMm= .24 and R2GLMMc= .51. A significant interaction indicated that task complexity effects differed depending on task-type. A simple effects test was conducted, and it was found that the most complex SAT version elicited significantly more diverse vocabulary than the mid-complex (p = .034) and least complex versions (p = .027). In the case of the CAT, the most and mid-complex versions elicited significantly more diverse vocabulary than the least complex version (p = .004 and p = .002, respectively). Pairwise comparisons revealed that the CAT produced significantly more diverse vocabulary, followed by the SAT, and then the MT. When task-type was ignored, the most complex task versions produced significantly more diverse vocabulary than the least complex versions. In the case of MTLD, a significant main effect was found for task-type, F(2, 159.918) = 61.705, p < .001, but not for task complexity and their interaction, F(2, 142.394) = 1.355, p = .261 and F(4, 109.413) = .843, p = .501, respectively. Marginal and conditional R2s for the fixed effects were found to be R2GLMMm= .25 and R2GLMMc= .30. Pairwise comparisons between task-types showed that the CAT elicited significantly greater lexical diversity than the SAT and the MT, and the SAT elicited significantly greater lexical diversity than the MT. Results of a mixed model conducted on VOCD revealed significant main effects for task-type and task complexity F(2, 131.692) = 38.770, p < .001 and F(2, 191.016) = 32.282, p < .001, respectively, but a non-significant interaction, F(4, 133.948) = 1.890, p = .116. Marginal and conditional R2s for the fixed effects were R2GLMMm= .09 and R2GLMMc= .48. When task complexity was factored out, the SAT elicited significantly more diverse vocabulary than the MT and the CAT (p < .001 for both comparisons). When task-type was ignored, the most complex versions elicited significantly greater diverse vocabulary, followed by the mid-complex versions, and then the least complex versions (p < .001 for comparisons between the most complex and mid- or least complex versions, and p = .007 for the comparison between the mid-complex and least complex versions). To find out whether increases in task complexity affected the proportion of academic words (AWL) in speech, a 3 × 3 repeated measures analysis of variance was computed with task-type and task complexity as the within-subjects variable. Because Mauchly's Test of Sphericity indicated that the assumption of sphericity had been violated for task-type, χ2(2) = .711, p = .002, a Huynh–Feldt correction was used. Results revealed a significant main effect for task-type, F(1.604, 62.545) = 14.107, p < .001, partial η2 = .266, but a non-significant main effect for task complexity and their interaction, F(2, 78) = 2.660, p = .076, partial η2 = .064 and F(4, 156) = .705, p = .590, partial η2 = .018, respectively. Pairwise comparisons showed that participants produced significantly more academic words during the CAT than the SAT and MT (p < .001 and p = .013, respectively), and produced significantly more academic words during the SAT than the MT (p = .020). DISCUSSION RQ1. Do task complexity manipulations along the ± number of elements dimension lead to systematic changes in self-ratings of cognitive load? The study investigated whether increasing task complexity led to changes in self-ratings of perceived difficulty, mental effort, and stress, which were hypothesized to increase as task complexity increased. These predictions were borne out, as the results of statistical analyses showed a positive linear relationship between task complexity and these outcome measures. Significant interactions between task complexity and task-type were found for mental effort and stress, indicating that increases in task complexity had differential effects on these variables depending on task-type. Moreover, certain task-types had differential effects on the learner such that the SAT imposed greater cognitive load on the learner, followed by the CAT, and then the MT. The slightly different, but considerably similar, results indicate that perceived difficulty, mental effort, and stress are different constructs that should be investigated separately as measures of cognitive load. Furthermore, significant interactions between task-type and task complexity suggest that learner perceptions vary depending on the type of task they perform. Although all three tasks employed in the study were able to show that the tasks intended to be complex were perceived as such, the most and mid-complex MT versions did not differ significantly in terms of mental effort and stress ratings. This may be due to the low number of obstacles that were increased in the MT—if there were more obstacles used in the most complex version, there may have been a clearer linear relationship between task complexity and self-ratings of cognitive load. RQ2. Do task complexity manipulations along the ± number of elements dimension lead to systematic changes in duration judgments on tasks? The present study is one of the few in SLA research that utilized prospective duration judgments to measure cognitive load. According to attentional models in psychology, time estimations are determined by the amount of attention allocated to the processing of temporal information. In the prospective paradigm, attention is assumed to be shared by a non-temporal information processor and a temporal information processor (Block 1992). Although it is claimed that the two processors focus on different stimuli, many studies have found that some of the same attentional resources are required for processing both temporal and non-temporal information. Accordingly, it is predicted that fewer attentional resources may be allocated to temporal information when the non-temporal processing load is increased (Block 2003). Therefore, a negative linear relationship between prospective judgment length and load of non-temporal information processing is assumed. In this study, the ratios of participants’ time estimations to the actual time of planning and speech were used to compare the effects of task complexity across three task-types. It was predicted that the duration judgment ratio would decrease as task complexity increased. Results obtained from statistical analyses on speech duration estimations were consistent with this prediction. Relative to less complex versions, the most complex versions of tasks increased the load of non-temporal processing, resulting in a reallocation of attentional resources from temporal information and a decrease in speech time estimates. However, this negative relationship was not found in the case of planning-time estimations. One possible explanation could be participants’ awareness of the upper limit on planning time for each task-type: up to 2 min for the MT and CAT, and up to 5 min for the SAT. The computer beeped and the screen changed when time was up, providing additional temporal information to participants. Some of the participants knew or could guess the approximate amount of time they spent on planning, and this may account for the non-significant relationship between task complexity and planning judgment ratios. Significant effects were found for task-type on planning and speech estimations, with the highest ratios obtained for the CAT. When preparing for the task, the SAT imposed the greatest load of information-processing, followed by the MT and then the CAT. While participants were carrying out the primary task, the CAT was found to be significantly easier for them to process than the SAT. This difference between the CAT and the SAT can be explained by the nature of the tasks. The CAT simply required participants to remember and report details of a car accident video clip, while the SAT required them to arrange the best seating plan and provide reasons for their choices. There were even a few instances where participants would rearrange the plan they had originally designed while speaking. Such findings show that certain task-types may be perceived to be more complex than others. RQ3. Do task complexity manipulations along the ± number of elements dimension lead to systematic changes in dual-task measures? The dual-task method adopted in this study required participants to respond to screen color changes while simultaneously carrying out the primary task. Task complexity was predicted to affect performance on the dual task such that the number of correct responses would decrease and reaction time would increase as task complexity increased. Results of the statistical analyses were consistent with these predictions: the reaction time on the most complex versions was significantly longer than those on the mid- and least complex versions. Furthermore, a significant interaction between task-type and task complexity showed that for the MT and the SAT, the most complex versions placed greater cognitive load onto the learner than the least complex version. In fact, the SAT functioned the best at capturing cognitive complexity differences between the three task versions. Similar to the findings regarding participant self-ratings, task complexity effects were found to be moderated by the types of tasks employed. These findings are slightly different from those of Révész et al. (2014) and Révész et al. (2015). In their studies, accuracy rates decreased when participants performed the complex task versions, but no significant difference was found for reaction time. They concluded that accuracy is a more sensitive measure of cognitive load than reaction time for dual-task methods using screen color changes. In contrast, the present study found stronger effects for reaction time than accuracy. A possible explanation may lie in the difference in the way participants responded to color changes. In the present study, they had the option of pressing one of two keys. The two earlier studies required them to press a key or ignore the color changes. With the additional option in the present study, ‘the required level of interference’ (Révész et al. 2015, p. 29) may have been created, so that reaction time could capture the cognitive load of the primary task. Furthermore, the participants in the present study were required to click on the left shift key in response to screen color changes from white to green, and the right shift key in response to color changes from white to red. The selection of the left and right shift keys for the color changes was deliberate because it was considered to be counterintuitive: normally, green means ‘go’, which is usually associated with a right-side key, and red means ‘stop’, which is usually associated with a left-side key. Such key-color assignments may have been complex enough to capture different reaction times according to the complexity of the primary task. RQ4. Do task complexity manipulations along the ± number of elements dimension lead to systematic changes in native speakers’ oral production? So far, findings provide support for the validity of task complexity manipulations, in that complex task versions placed greater cognitive load on participants than the simpler versions, as indicated by self-ratings of cognitive load, prospective time judgments, and dual-task method outcomes. Another purpose of the study was to find out whether such manipulations had a positive influence on the syntactic complexity, lexical diversity, and lexical sophistication of participants’ oral production. Although significant task complexity effects were not found for the number of clauses per-AS unit and mean length of AS-unit, participants were found to produce a greater number of subordinate clauses per-AS unit when performing the mid-complex task versions than when performing the most complex and least complex versions. Moreover, the SAT generally elicited the most complex speech, followed by the CAT, and then the MT. Different patterns were found in the case of lexical diversity. Measures of Guiraud’s Index and VOCD showed that the most lexically diverse speech was produced during performance of the most complex task versions. Task complexity effects on Guiraud’s Index were also mediated by task-type, such that certain tasks showed greater complexity effects than others—the most complex versions of the SAT and the CAT were significantly better at eliciting the most diverse vocabulary. All measures of lexical diversity showed that vocabulary was affected by task-type such that the CAT elicited the greatest lexical diversity, followed by the SAT, and then the MT. Participants also produced the highest proportion of academic words while performing the CAT, most likely due to the fact that the task required them to pretend to be a news reporter, thus driving them to use language that is more formal and less casual than giving directions to a friend as in the MT. Although a linear pattern was shown for lexical diversity, a reverse V-shaped pattern was found for the number of subordinate clauses per AS-unit. Unlike the original prediction, the mid-complex task versions elicited the most syntactically complex structures. The possibilities of the mid-complex versions actually being most complex or the most complex versions not being complex were ruled out due to the validating findings yielded by the cognitive load measures. A possible explanation is that participants perceived the most complex task versions to be so complex that they short-circuited the task and simplified it, either intentionally ignoring the added elements or unintentionally not being able to notice them. Regardless of the reason behind this task simplification, participants are still able to complete the task in a minimally satisfactory manner and move on to the next one. For the MT, many participants did not give explanations behind their choices of routes at all levels of task complexity. For the SAT, many did not provide reasons for their decisions on the most complex version and merely produced simple sentences. For the CAT, perhaps due to the low resolution of the video clip, many did not mention each vehicle in detail and simply ignored some of the added elements in the most complex version. Consequently, the most complex task versions failed to generate the most complex syntactic structures. In fact, they produced speech of lower syntactic complexity than the mid-complex versions. Both for laboratory research and classroom teaching materials, it is clearly going to be necessary to design tasks cleverly enough to prevent participants from ignoring elements in more complex versions, perhaps by building in task-internal feedback loops. Table 8 shows examples of task simplification and examples of proper task completion. Task-type was found to have a differential effect on participants, with the SAT eliciting the most complex structures, followed by the CAT and the MT. Whereas the less effective tasks set up a situation in which participants provided instructions (MT) or gave a report (CAT) to an unseen, imaginary audience, the SAT did not involve such an audience and was similar to a think-aloud task. The non-existence/existence of an audience may be a potential reason behind the differences in task-type effectiveness. In a controlled monologic task, such as the SAT, the exclusion of the possibility of online feedback from a live interlocutor that is typical in interactive tasks, for example, expressions of comprehension or lack thereof, may also result in potentially important differences in task-type effects. Conclusion and limitations In TBLT, it is crucial to identify principled criteria with which task-types can be classified and pedagogic tasks should be sequenced. The literature on the impact of task complexity on linguistic CAF measures is extensive, but results of empirical studies have been inconsistent and sometimes contradictory. To find out whether task manipulations actually lead to cognitive load changes, which in turn are assumed to produce positive changes in language production, a combination of self-ratings of cognitive load, time estimations, and the dual-task method were employed in this study. Findings provide support for the claim that increasing task complexity leads to systematic changes in cognitive load. In addition, task-type was found to play a significant role in the effects of task complexity manipulations. However, the most complex versions of tasks failed to elicit the most complex linguistic speech. Bearing that in mind, future studies might investigate the characteristics of task-types that are most effective and look into how feedback can be built into materials in such a way that participants will not be able to simplify complex tasks. More insight on these issues could be key factors in task classification and sequencing in TBLT and other kinds of communicative language teaching. The present study was small-scale, in a laboratory setting, with 42 native English speakers as participants. Native speakers were recruited so as to obtain clear evidence, unfiltered through non-native competence, that task complexity effects are real before assessing their effects on L2 performance. Various learner characteristics, such as L2 proficiency, linguistic aptitude, and working memory, may be important moderators of task complexity effects on L2 production. Once the increasing complexity of a set of tasks has been established empirically, a second phase of work can begin with L2 learners. Future research should compare the baseline performance of native speakers with that of L2 learners, thereby obtaining more confidence in the validity of task complexity manipulations and the effects of task complexity on language development. In this respect, this study should be regarded as the initial phase of a larger study to be conducted in the future. The present study contains several methodological weaknesses that should be addressed in further research. First, all participants were subjected to the dual-task methodology, which may have imposed an extra load onto them. It would have been better to have had separate groups in the study—those who participated in the dual-task method, and those who did not. Furthermore, although extra care was taken when operationalizing task complexity so that only ±number of elements were manipulated, there is a possibility that reasoning demands may have been affected as a result. There is a lack of clear guidance on how to operationalize task complexity in previous research, and ± number of elements and ±Here-and-now are among the easiest to manipulate, considering that their names are fairly self-explanatory. Nonetheless, extra cautionary steps should be taken when designing tasks and manipulating task complexity because other dimensions may be affected as well. SUPPLEMENTARY DATA Supplementary material is available at Applied Linguistics online. Jiyong Lee is a PhD student in the Second Language Acquisition program at the University of Maryland. Her research interests include task complexity effects on L2 performance, the relationships among task complexity, language aptitude, working memory, negative feedback, and age affects and maturational constraints in SLA. Address for correspondence: Jiyong Lee, University of Maryland 1102 Francis Scott Key Hall College Park, MD 20742, USA. <jlee0123@umd.edu> Acknowledgements The author would like to express her greatest appreciation to Dr Michael Long for his valuable suggestions and guidance through the planning and development of this research work. The author would also like to thank Dr Steven Ross and Dr Dan McNeish for their advice on data analyses. The author’s gratitude also extends to Dr Nan Jiang for his support for this project. Special thanks to several colleagues and friends at the University of Maryland who provided valuable feedback. The author is also grateful to the anonymous reviewers for their efforts and insightful suggestions. References Baralt M. L. 2013 . ‘ The impact of cognitive complexity on feedback efficacy during online versus face-to-face interactive tasks ,’ Studies of Second Language Acquisition 35 : 689 – 725 . Google Scholar CrossRef Search ADS Block R. A. 1992 . ‘Prospective and retrospective duration judgment: The role of information processing and memory’ in Macar F. , Pouthas F. , Friedman W. J. (eds): Time, Action and Cognition: Towards Bridging the Gap . Springer Science & Business Media . Block R. A. 2003 . ‘Psychological timing without a timer: The roles of attention and memory’ in Helfrich H. (ed.): Time and Mind II . Hogrefe Publishing . Block R. A. , Hancock P. A. , Zakay D. . 2010 . ‘ How cognitive load affects duration judgments: A meta-analytic review ,’ Acta Psychologica 134 : 330 – 43 . Google Scholar CrossRef Search ADS PubMed Block R. A. , Zakay D. . 2008 . ‘Timing and remembering the past, the present, and the future’ in Grondin S. (ed): Psychology of Time . Emerald Group Publishing Ltd . Brown S. W. 1985 . ‘ Time perception and attention: The effects of prospective versus retrospective paradigms and task demands on perceived duration ,’ Perception and Psychophysics 38 : 115 – 24 . Google Scholar CrossRef Search ADS PubMed Brunken R. , Plass J. L. , Leutner D. . 2003 . ‘ Direct measurement of cognitive load in multimedia learning ,’ Educational Psychologist 38 : 53 – 61 . Google Scholar CrossRef Search ADS Cierniak G. , Scheiter K. , Gerjets P. . 2009 . ‘ Explaining the split-attention effect: Is the reduction of extraneous cognitive load accompanied by an increase in germane cognitive load? ,’ Computers in Human Behavior 25 : 315 – 24 . Google Scholar CrossRef Search ADS Cobb T. 2002 . ‘Web Vocabprofile [, An Adaptation of Heatley, Nation & Coxhead's (2002) Range],’ available at http://www.lextutor.ca/vp/. Accessed August 2017. Coxhead A. 2000 . ‘ A new academic word list ,’ TESOL Quarterly 34 : 213 – 38 . Google Scholar CrossRef Search ADS Ellis D. 2011 . ‘The role of task complexity in the linguistic complexity of native speaker output,’ Qualifying paper, PhD in Second Language Acquisition Program. University of Maryland. Foster P. , Skehan P. . 1996 . ‘ The influence of planning and task type on second language performance ,’ Studies in Second Language Acquisition 18 : 299 – 323 . Google Scholar CrossRef Search ADS Foster P. , Tavakoli P. . 2009 . ‘ Native speakers and task performance: Comparing effects on complexity, fluency, and lexical diversity ,’ Language learning 59 : 866 – 96 . Google Scholar CrossRef Search ADS Gilabert R. 2007 . ‘ Effects of manipulating task complexity on self-repairs during L2 oral production ,’ International Review of Applied Linguistics in Language Teaching 45 : 215 – 40 . Google Scholar CrossRef Search ADS Gilabert R. , Barón J. , Levkina M. . 2011 . ‘Manipulating task complexity across task types and modes’ in Robinson P. (ed.): Second Language Task Complexity: Researching the Cognition Hypothesis of Language Learning and Performance . John Benjamins Publishing Company . Gilabert R. , Barón J. , Llanes À. . 2009 . ‘ Manipulating cognitive complexity across task types and its impact on learners' interaction during oral performance ,’ International Review of Applied Linguistics in Language Teaching 47 : 367 – 95 . Google Scholar CrossRef Search ADS Guiraud P. 1954 . Les Charactères Statistiques du Vocabulaire. Essai de méthodologie . Presses Universitaires de France . Ishikawa T. 2011 . ‘Examining the influence of intentional reasoning demands on learner perceptions of task difficulty and L2 monologic speech’ in Robinson P. (ed.): Second Language Task Complexity: Researching the Cognition Hypothesis of Language Learning and Performance . John Benjamins Publishing Company . Jackson D. O. , Suethanapornkul S. . 2013 . ‘ The cognition hypothesis: A synthesis and meta-analysis of research on second language task complexity ,’ Language Learning 63 : 330 – 67 . Google Scholar CrossRef Search ADS Kim Y. 2009 . ‘ The effects of task complexity on learner–learner interaction ,’ System 37 : 254 – 68 . Google Scholar CrossRef Search ADS Kim Y. , Payant C. , Pearson P. . 2015 . ‘ The intersection of task-based interaction, task complexity, and working memory ,’ Studies in Second Language Acquisition 37 : 549 – 81 . Google Scholar CrossRef Search ADS Long M. H. 1985 . ‘A role for instruction in second language acquisition: task-based language teaching’ in Hyltenstam K. , Pienemann M. (eds): Modelling and Assessing Second Language Acquisition . Multilingual Matters Ltd . Long M. H. 1996 . ‘The role of the linguistic environment in second language acquisition’ in Ritchie W. R. , Bhatia T. J. (eds): Handbook of Second Language Acquisition . Academic Press . Long M. H. 2015 . ‘Task-based syllabus design’ in Long M. (ed.): Second Language Acquisition and Task-Based Language Teaching . Wiley . Long M. H. , Crookes G. . 1992 . ‘ Three approaches to task-based syllabus design ,’ TESOL Quarterly . 26 : 27 – 56 . Google Scholar CrossRef Search ADS Malicka A. , Levkina M. . 2012 . ‘ Measuring task complexity: Does L2 proficiency matter ,’ Task-Based Language Teaching in Foreign Language Contexts: Research and Implementation . 43 – 66 . McNamara D. S. , Louwerse M. M. , Cai Z. , Graesser A. . 2013 . ‘Coh-Metrix version 3.0,’ available at http://cohmetrix.com. Accessed August 2017 Michel M. C. 2011 . ‘ Effects of task complexity and interaction on L2 performance ,’ Second Language Task Complexity: Researching the Cognition Hypothesis of Language Learning and Performance 2 : 141 – 73 . Google Scholar CrossRef Search ADS Michel M. C. , Kuiken F. , Vedder I. . 2007 . ‘ The influence of complexity in monologic versus dialogic tasks in Dutch L2 ,’ International Review of Applied Linguistics in Language Teaching 45 : 241 – 59 . Google Scholar CrossRef Search ADS Norris J. M. 2010 . ‘Understanding instructed SLA: Constructs, contexts, and consequences,’ in Plenary address delivered at the annual conference of the European Second Language Association (EUROSLA), Reggio Emilia. Norris J. M. , Ortega L. . 2009 . ‘ Towards an organic approach to investigating CAF in instructed SLA: The case of complexity ,’ Applied Linguistics 30 : 555 – 78 . Google Scholar CrossRef Search ADS Révész A. 2014 . ‘ Towards a fuller assessment of cognitive models of task-based learning: Investigating task-generated cognitive demands and processes ,’ Applied Linguistics 35 : 87 – 92 . Google Scholar CrossRef Search ADS Révész A. , Sachs R. , Hama M. . 2014 . ‘ The effects of task complexity and input frequency on the acquisition of the past counterfactual construction through recasts ,’ Language Learning 64 : 615 – 50 . Google Scholar CrossRef Search ADS Révész A. , Michel M. , Gilabert R. . 2015 . ‘ Measuring cognitive task demands using dual task methodology, subjective self-ratings, and expert judgments: a validation study ,’ Studies in Second Language Acquisition 28 : 1 – 35 . Révész A. , Kourtali N. E. , Mazgutova D. 2017 . ‘ Effects of task complexity on L2 writing behaviors and linguistic complexity, ,’ Language Learning 67 : 208 – 41 . Google Scholar CrossRef Search ADS Robinson P. 1995 . ‘ Task complexity and second language narrative discourse ,’ Language Learning 45 : 99 – 140 . Google Scholar CrossRef Search ADS Robinson P. 2001a . ‘Task complexity, cognitive resources, and syllabus design: A triadic framework for examining task influences on SLA’ in Robinson P. (ed.): Cognition and Second Language Instruction . Cambridge University Press . Google Scholar CrossRef Search ADS Robinson P. 2001b . ‘ Task complexity, task difficulty, and task production: Exploring interactions in a componential framework ,’ Applied Linguistics . 22 : 27 – 57 . Google Scholar CrossRef Search ADS Robinson P. 2003 . ‘ The cognitive hypothesis, task design, and adult task-based language learning ,’ Second Language Studies . 21 : 45 – 105 . Robinson P. 2005 . ‘ Cognitive complexity and task sequencing: studies in a componential framework for second language task design ,’ International Review of Applied Linguistics in Language Teaching . 43 : 1 – 32 . Google Scholar CrossRef Search ADS Robinson P. 2007 . ‘Criteria for classifying and sequencing pedagogic tasks’ in Mayo M. G. (ed.): Investigating Tasks in Formal Language Learning . Multilingual Matters Ltd . Robinson P. 2011 . ‘Second language task complexity, the cognition hypothesis, language learning, and performance’ in Robinson P. (ed.): Second Language Task Complexity: Researching the Cognition Hypothesis of Language Learning and Performance . John Benjamins Publishing Company . Google Scholar CrossRef Search ADS Rostamian M. , Fazilatfar A. M. , Jabbari A. . 2017 . ‘ The effect of planning time on cognitive processes, monitoring behavior, and quality of L2 writing ,’ Language Teaching Research : 1 – 21 .. Sasayama S. 2016 . ‘ Is a ‘complex’ task really complex? Validating the assumption of cognitive task complexity ,’ The Modern Language Journal 100 : 231 – 54 . Google Scholar CrossRef Search ADS Skehan P. 1996 . ‘ A framework for the implementation of task-based instruction ,’ Applied Linguistics 17 : 38 – 62 . Google Scholar CrossRef Search ADS Skehan P. 1998 . A Cognitive Approach to Language Learning . Oxford University Press . Skehan P. 2014 . Processing Perspectives on Task Performance . John Benjamins Publishing Company . Google Scholar CrossRef Search ADS Skehan P. , Foster P. . 1997 . ‘ Task type and task processing conditions as influences on foreign language performance ,’ Language Teaching Research 1 : 185 – 211 . Google Scholar CrossRef Search ADS Zakay D. 1992 . ‘On prospective time estimation, temporal relevance and temporal uncertainty’ in Macar F. , Pouthas F. , Friedman W. J. (eds): Time, Action and Cognition: Towards Bridging the Gap . Springer Science & Business Media . © Oxford University Press 2018 This article is published and distributed under the terms of the Oxford University Press, Standard Journals Publication Model (https://academic.oup.com/journals/pages/about_us/legal/notices) http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png Applied Linguistics Oxford University Press

Task Complexity, Cognitive Load, and L1 Speech

Applied Linguistics , Volume Advance Article – Jan 2, 2018

Loading next page...
 
/lp/ou_press/task-complexity-cognitive-load-and-l1-speech-S1fL0OBuu9
Publisher
Oxford University Press
Copyright
© Oxford University Press 2018
ISSN
0142-6001
eISSN
1477-450X
D.O.I.
10.1093/applin/amx054
Publisher site
See Article on Publisher Site

Abstract

Abstract Relationships among task characteristics, L2 performance, and interlanguage development are of interest both for SLA research and the design of syllabuses and language teaching materials. Complexity has been identified as a promising, but methodologically problematic, task design feature. A study was conducted of the effects of progressive increases in the complexity (operationalized as number of elements) of three versions of each of three tasks on the syntactic complexity and lexical diversity of the speech of 42 English native speakers. Data on native speaker performance are important because they reveal task complexity effects unfiltered by non-native competence. Independent evidence that greater task complexity increased cognitive load was shown by participant self-ratings of perceived difficulty, mental effort, and stress, shorter prospective duration estimates and, using dual-task methodology, slower reaction times. Mid-complex versions of the three tasks elicited the most complex syntactic structures, and the most complex versions elicited the greatest lexical diversity. Implications are noted for the design of parallel studies with non-native speakers, along with suggested methodological improvements for future research with native and non-native populations. INTRODUCTION Over the past 50 years, SLA has witnessed a considerable number of pedagogical models and approaches which have undergone many changes through continuous trial and error. Recent approaches that are prevalent to this day, for example Communicative Language Teaching, the Natural Approach, Content-Based Instruction, and Task-Based Language Teaching (TBLT), have shifted their attention from the teacher to the learner’s needs and interests. Among such approaches, TBLT is a particularly learner-centered one for its focus on the importance of a logically conducted analysis of learners’ needs, which is an essential prerequisite for the motivation, design, and success of second and foreign language programs. Furthermore, it offers a solution to problems with existing approaches such as synthetic syllabuses and ‘focus on forms’, and analytic syllabuses and ‘focus on meaning’, by employing an analytic (task) syllabus and a ‘focus on form’ method that involves timely reactive attention to linguistic problems in context, as they arise during task performance (see Long 2015). Long (1985) first proposed the idea of task as a meaningful and viable unit of analysis in (i) identifying learners' needs, (ii) defining syllabus content, (iii) organizing language acquisition opportunities, and (iv) measuring student achievement. Since the 1985 publication, tasks have been the subject of a considerable amount of research concerning their uses in both SLA research and language teaching. A key issue in TBLT is to identify principled criteria with which tasks can be classified and sequenced. Despite differing in rationale and predictions, Robinson (2001a, 2003, 2005, 2011) and Skehan (1996, 1998) each devised a model that shared the same underlying goal of providing such criteria. The two models have generated a proliferation of studies in recent years (see Robinson 2011; Skehan 2014). Unfortunately, most have reported mixed or null findings, unable to provide unambiguous evidence for or against either. To investigate this issue, this study investigates (i) whether increasing task complexity increases cognitive load, and (ii) whether differences in cognitive load affect native speaker performance in ways beneficial for L2 development. MODELS OF TASK COMPLEXITY Long (1996, 2015) and Long and Crookes (1992) have proposed that during syllabus design, pedagogic tasks should be sequenced in an order of increasing complexity (task complexity, not linguistic complexity), eventually resembling the full demands of real-world target tasks that a needs analysis shows learners need to be able to perform successfully. A model consistent with this proposal was developed by Robinson (2003, 2007, 2011): the Triadic Componential Framework and the Cognition Hypothesis (CH). The Triadic Componential Framework specifies three superordinate categories of tasks: task complexity (cognitive factors), condition (interactive factors), and difficulty (learner factors). The fundamental claim of the CH is that increases in task complexity are the logical basis for task sequencing and syllabus design because attentional resources are increasingly engaged as the cognitive demands of tasks are increased. Three major predictions are made: (i) increases in task complexity along resource-directing dimensions will push learners to greater accuracy and linguistic complexity, but less fluency, (ii) greater task complexity will promote interaction and negotiation of meaning, leading to heightened attention to, and incorporation of, task input and modification of output, and (iii) individual differences in ability and affective variables contributing to perceptions of task difficulty will differentiate performance and learning as task complexity increases. An alternative model for task sequencing was proposed by Skehan (1996, 1998) and Skehan and Foster (1997): the Limited Attentional Capacity Hypothesis (LACM), or the Trade-Off Hypothesis. Unlike the CH, the LACM assumes a single source of attention accessible to learners, whose limited capacity restricts the mapping of form-meaning relationships. Skehan (1998) proposed that a learner can only attend to one aspect of linguistic performance, either accuracy or complexity, at the expense of the other. The trade-off between accuracy and complexity means that increased fluency may be accompanied by either greater accuracy or complexity (at best), but not by both at the same time (Skehan and Foster 1997). Task characteristics that are argued to affect the nature of performance include familiarity of information, interactivity, degree of structure, complexity of outcomes, and transformation of information. TASK COMPLEXITY STUDIES AND THEIR LIMITATIONS The LACM and CH have given rise to a plethora of studies, most of which focus on an increase in resource-directing dimensions that are relatively easier to operationalize, that is ± Here-and-Now, ± few elements, and ± intentional reasoning. In addition, other factors, such as h/l (high/low) L2 proficiency, ±monologic, and ±planning-time have been manipulated in numerous studies. Support for either model has been cited, but in fact, many have obtained mixed/null findings or have drawn conclusions based on faulty interpretations. In an effort to obtain support for the LACM, Foster and Skehan (1996) investigated the effects of task complexity and planning on language production. Thirty-two learners of English as a foreign language (EFL) with various L1 backgrounds performed three types of two-way oral tasks (personal information exchange, narrative, and decision-making). These tasks were assumed to differ in task complexity, with the personal information exchange task considered as the least complex, and the decision-making task the most complex. Participants were divided into three groups that differed in the conditions under which tasks were performed: no planning, undetailed planning, and detailed planning. The authors found that production was more fluent and more syntactically complex under planned conditions. However, those in the undetailed planning condition produced the most accurate output. A significant interaction was found between task-type and planning condition such that planning effects on accuracy and linguistic complexity were greater in the more complex narrative and decision-making tasks than in the simple personal information exchange task. The authors concluded that their findings supported the LACM, since planning had a positive effect on fluency and complexity, but not accuracy, thus indicating a trade-off effect. Similar results were found in Skehan and Foster’s (1997) study. Forty EFL students with diverse L1 backgrounds performed the same task-types as in Foster and Skehan (1996). However, this study investigated the effects of planning (+planning vs. −planning) and ±knowledge of a post-task activity. It was found that students under planned conditions, as opposed to unplanned conditions, generally showed greater fluency, accuracy, and complexity in their oral output. When planning effects were compared across task-types, strong effects on accuracy were found in the narrative task but not the decision-making task. The reverse pattern was found for complexity, supporting the idea of a trade-off effect. Knowledge of a post-task activity had only a small effect on accuracy. In a small-scale study with 12 learners of English, Robinson (1995) found limited support for the CH. Participants of intermediate proficiency performed oral narrative tasks with increasing complexity along the ±Here-and-Now dimension. Support for the CH was found with regard to the proportions of lexical content words—learners used a greater variety of lexical words in the [−Here-and-Now] condition. However, no significant differences were found for accuracy and fluency, which were measured by five outcome measures. Robinson attributed the lack of significant findings to a small sample size, questionable reliability and validity of the outcome measures, relatively low proficiency of the learners, and use of one-way open tasks as opposed to two-way closed tasks. Bearing in mind the limitations of earlier work, Robinson (2001b) conducted a more fine-grained study and investigated the effects of task complexity, task sequencing, and task role on learners’ interactive production and their perceptions of difficulty. He predicted that task complexity would have similar effects on language production in both monologic and interactive tasks, except with regard to linguistic complexity. The assumption was that the nature of a complex interactive task would lead to reduced linguistic complexity, due to greater numbers of elliptical/single-clause answers to clarification requests and confirmation checks. Forty-four Japanese learners of English performed two versions of a direction-giving map task (MT) (±Here-and-Now). Robinson found that in the complex task, token type ratio and number of words per clause were significantly lower, and the number of confirmation checks was significantly higher than in the simple version. Perceptions of difficulty and stress were higher in the complex task, and the task sequence of simple to complex resulted in higher accuracy and fluency. These findings are congruent with the predictions of the CH, in that complex tasks would lead to increased lexical diversity and use of confirmation checks, and reduced fluency. With 46 university students of lower-intermediate English proficiency, Gilabert (2007) investigated how the ±Here-and-Now dimension interacted with ± planning time. Participants performed an oral narrative task with comic strips, and ±Here-and-Now was operationalized by present/past tense and presence/absence of contextual support. A total of 10 were given in the [+planning time] condition, and 50 s in the [−planning time] condition. Statistical analyses revealed that while there was a positive task complexity effect on accuracy and fluency in planned and unplanned conditions, structural complexity remained the same in both conditions. Increasing task complexity even reduced lexical complexity. Gilabert also found that the planned condition was more beneficial for fluency and lexical complexity. In short, this study supported predictions concerning ±planning time, but directions of the fluency and lexical complexity measures were counterevidence to the LACM and CH. Several studies compared the effects of monologic tasks with those of interactive tasks. In Michel et al.’s (2007) study, 44 learners of Dutch (L1 being either Moroccan or Turkish) performed an oral task that was increased in complexity along the ±few elements dimension. Participants were assigned to one of two groups: the monologic condition or the dialogic (dyad) condition. The oral task involved giving advice to a friend about buying a certain device. Those in the [+monologic] condition left a message over the phone, while those in the [−monologic] condition discussed the matter with their partner over the phone. The simple task involved two devices, and the complex task involved six. Results showed that increased task complexity promoted accuracy, slightly affected lexical complexity, but reduced fluency. Compared to monologic tasks, interactive tasks triggered greater accuracy and fluency, and lower structural complexity. An overall significant interaction between task complexity and interactivity was not found. When looking at interactions on specific outcome measures, learners were found to use more accurate speech in the complex monologic task. However, this beneficial effect disappeared in the complex dialogic task, which runs counter to the CH predictions of interactivity. In light of these results, the authors concluded that their study provided limited support for the CH, and rejected the existence of trade-off effects between accuracy and complexity. A similar study was conducted by Michel (2011), which investigated increased task complexity effects along the ±few elements dimension. This study distinguishes itself from others in that the researcher compared the performance of 64 learners of Dutch with that of 44 native speakers of Dutch. Two types of oral tasks were employed, in which participants had to choose the best dating or studying couple among four people (simple) or six people (complex). Michel found that increased complexity had a significant effect only on lexical diversity. There were no statistically significant differences between simple and complex tasks with regard to accuracy. Learners displayed greater accuracy, lexical diversity, and fluency on the interactive tasks, but less structural complexity on the monologic tasks. With respect to fluency measures, only native speakers showed a significant interaction effect between task complexity and interactivity such that they were more fluent in simple dialogues. Also taking into consideration the results of task difficulty judgments, which revealed that dialogues were considered easier than monologues, Michel concluded that monologues are cognitively more complex than dialogues, and that the study only provided limited support for the CH. Robinson’s and Skehan’s models of task complexity have generated considerable interest among educators and language researchers. However, much previous research is not without limitations. Three important problems include: (i) a lack of consistent operationalization of complexity dimensions, (ii) a lack of consistency in the choice and operationalization of outcome measures, and (iii) a failure to include native speaker data. Any one of these, or a combination, may have contributed to the mixed or null findings within and across studies. The LACM claims that code complexity, cognitive complexity, and communicative stress may influence the learner in such a way that attention is allocated to certain aspects of linguistic performance. However, it fails to explain how each dimension is operationalized, and how they interact with one another. Likewise, the more intricate CH provides no clear guidelines for operationalizing dimensions of task complexity nor suggests how task conditions and difficulty interact to affect performance. As a result, most researchers have manipulated task complexity along one or more of three dimensions: ± Here-and-Now, ±few elements, and ±intentional reasoning. With regard to measures of linguistic performance, Ellis (2011) criticized early work (Foster and Skehan 1996, and Robinson 1995) for their questionable operationalizations of complexity, accuracy, and fluency (CAF). Jackson and Suethanapornkul (2013) reported 84 different measures of complexity, accuracy, lexis, and fluency employed in just 17 CH studies. Norris and Ortega (2009) found 13 different measures of complexity alone in 16 studies. The number and types of performance measures show immense variation, and the lack of consistent findings may well be due to this variation. The third problem relates to the omission of native speaker data. Although the two task complexity frameworks pertain to L2 task performance and development, it would be beneficial, and potentially necessary, to test predictions with L1 speakers to establish a systematic relationship between task design manipulations and performance. Unlike L2 learners, who vary widely in individual differences, native speakers ‘have a full, homogeneous, and comparable, command of their L1’ (Long 2015, p. 239). As Foster and Tavakoli (2009) point out, native speaker data should be used as a baseline when investigating how learners perform language tasks because it enables us to distinguish the performance features that are due to L2 processing from those due to task performance. Using native speaker data initially ‘gives greater validity into claims that this is affected by the independent variable(s) under scrutiny’ (Foster and Tavakoli 2009, p. 868) and will provide a much more reliable window on task complexity effects. Any changes in performance, especially linguistic complexity, can be attributed to task complexity manipulations alone, without having to consider confounds resulting from task complexity effects having been filtered through the variability of learner competence (see Ellis 2011; and Long 2015). VALIDATION OF TASK COMPLEXITY MANIPULATIONS Numerous studies have claimed to find evidence supporting or refuting the LACM or the CH. However, Norris (2010) and Révész (2014) have argued that one important step has only been assumed and not empirically tested. To date, only a handful of studies have investigated whether task complexity effects actually lead to the desired changes in cognitive load, and whether those changes in turn cause an increase or reduction in accuracy, complexity, and fluency. They suggested four ways of addressing this issue: (i) subjective self-ratings, (ii) subjective time estimations, (iii) dual-task methodology, and (iv) psychophysiological techniques, such as eye-tracking. Stimulated recall protocols have also been used in recent TBLT studies as an introspective measure to tap into learner-internal processes (Kim et al. 2015; Révész et al. 2017; Rostamian et al. 2017). Self-ratings of difficulty have been employed in several studies testing the CH (Baralt 2013; Gilabert et al. 2009, 2011; Ishikawa 2011; Kim 2009; Michel 2011; Robinson 2001b; Sasayama 2016). In a study of 44 Japanese learners of English performing an oral interactive task, Robinson (2001b) found that increasing task complexity along the dimensions of amount of information and availability of prior knowledge had a significant effect on learners’ ratings of overall difficulty and stress. However, ratings of interest and motivation were found to be unrelated to task complexity manipulations. Usng the questionnaire created by Robinson (2001b), Gilabert et al. (2009) found that 60 learners of English perceived complex tasks to be more difficult and stressful, and felt less confident when performing them. On the other hand, there were no significant differences in their interest and motivation between simple and complex versions of tasks. Ishikawa (2011) investigated the performance of 46 Japanese learners of English performing oral tasks of increasing complexity along the ±intentional reasoning dimension. He found that complex reasoning was rated as more difficult than simple or no reasoning. Such studies show that self-ratings of difficulty and/or stress can be used to measure the cognitive load of a task. Regarding self-estimations of duration, the rationale is that time seems to pass quicker when a person is performing a difficult or attention-demanding task as opposed to one that is easy or less attention-demanding. Estimation of a target duration comes from two paradigms: the prospective paradigm (experienced duration) and retrospective duration (remembered duration) (Block and Zakay 2008). In the former, a person is aware during a time that she/he must estimate its duration. In the latter, she/he is aware of making an estimation only after the time has ended. Block et al.’s (2010) meta-analysis of 117 experiments found that if difficult processing is required, the ratio of subjective duration to objective duration decreases when a person is aware that time judgments are to be made. Put simply, people feel that time goes faster when performing more difficult tasks. Zakay (1992) and Block (1992) also found that prospective judgments were shorter for difficult tasks than for easy tasks. Attention models in psychology predict that processors of temporal and non-temporal information share the same attentional resources. When the load of non-temporal information is increased, fewer attentional resources are allocated to temporal information (Block 1992, 2003). Accordingly, people are assumed to be less accurate in estimating the duration of a task when greater cognitive load is placed on them. Several studies have employed time estimations to measure the cognitive complexity of tasks (Baralt 2013; Malicka and Levkina 2012). In Baralt’s (2013) investigation of differential modality effects, 84 adult learners of Spanish estimated how long it took for them to perform a story retelling task after task completion (i.e. retrospective paradigm), which was compared with the actual time that was spent on task performance. Results showed that the estimated times for the groups that carried out complex tasks (+intentional reasoning) were significantly longer than the actual task performance time, but those for the groups that carried out simple tasks (−intentional reasoning) were significantly shorter than the actual time. Malicka and Levkina (2012) also employed the retrospective paradigm in their attempt to validate task complexity effects. In this study, 37 learners of Spanish performed an instruction-giving task whose complexity was operationalized along ±number of elements and ±intentional reasoning, and then were asked to choose the task that they felt took longer to complete. However, they were not asked to estimate the time in minutes and/or seconds. The authors claimed that high-proficiency learners were more inaccurate than low-proficiency learners at time estimation, such that they believed it took longer to complete the complex task than the actual time. Although these studies are noteworthy for being among the first to introduce time estimations to task complexity research, there are three major problems that should be addressed. First, time should be estimated in terms of minutes and/or seconds for comparison with actual time by calculating the ratio of estimations to actual time, as in the present study. In cognitive psychology, this is a standard measure ‘so that all scores exist on the same relative scale’ (Brown 1985, p. 118). This enables the comparison of ratios within a single task-type, but also across different task-types. However, previous studies used subtractions instead of ratios, making it nearly impossible to make accurate comparisons across task-types and even within the same task-type. Second, asking participants to choose the task that they believe took longer to perform lead them to think there may be cognitive differences between tasks when in actuality they may not believe so. Third, because time estimations are subjective and may even vary largely within participants, it is important to subject all participants to this method, instead of comparing time estimations between groups that performed one version of the same task-type. Dual-task performance and brain activity measures are claimed to be objective, direct methods for measuring cognitive load (Brunken et al. 2003). This paradigm assumes that simultaneous performance of two tasks (primary and secondary) will have an impact on the distribution of attentional resources. The underlying principle is that ‘performance on the secondary task, assessed in terms of reaction time and accuracy, mirrors the level of cognitive load generated by the primary task’ (Révész 2014, p. 90). When there are different versions of a primary task that require attentional resources, performance in processing a secondary task will vary according to the cognitive load induced by the primary task. Three studies of particular interest that have investigated the validity of task complexity manipulations were those of Révész et al. (2014, 2015), and Sasayama (2016). In the first study, the cognitive load of two computer-delivered tasks, whose complexity was increased along the ± causal reasoning dimension, was measured by means of expert judgments, dual-task methodology, and eye-tracking. Two doctoral students in applied linguistics were asked to judge all 32 experimental items, with results showing that the versions intended to be more complex were rated as such. While 16 native speakers of English and 16 ESL learners performed the primary task of choosing a correct past event and orally producing a past counterfactual statement, the color of the computer screen changed to red or green at random intervals. Participants had to respond to these changes as quickly and accurately as possible. Although reaction times on the secondary task did not differ significantly between simple and complex task versions, accuracy rates were found to be a sufficiently sensitive measure of cognitive load. Participants achieved higher accuracy rates when performing the simple versions, and native speakers achieved higher accuracy rates than the ESL learners. Eye-tracking also provided support for the validity of task complexity manipulations in terms of fixation counts and fixation duration. It was also found that ESL learners showed longer fixation durations than native speakers, but not higher counts. Révész et al. (2015) attempted to validate task complexity using the dual-task method, participant self-ratings, and expert judgments. Forty-eight English native speakers and 48 ESL speakers performed three oral task-types: a picture narrative, an MT, and a decision-making task, each with a simple and complex version. The researchers adopted the dual-task method in Révész et al. (2014), with the secondary task requiring participants to respond to screen color changes. Participants also completed a perception questionnaire regarding the mental effort required by the task and overall task difficulty. Sixty-one ESL teachers also provided their expert judgments by answering the perception questionnaire and explaining the reasons behind their answers. The dual-task method was found to be a good measure of cognitive load, with participants’ accuracy on the secondary task being higher on the simple task version than the complex version. However, task complexity effects were not found for reaction time. Both ESL learners’ and teachers’ self-rated perceptions of mental effort and task difficulty provided further support for the validity of task complexity manipulations, with ratings being higher for complex task versions. In short, complex task versions placed greater cognitive load on participants than simple versions. In Sasayama’s (2016) study, the dual-task method, time estimations, and self-ratings of task difficulty and mental effort were employed. Fifty-three adult Japanese learners of English, divided into three groups according to their L2 proficiency, participated in four narrative tasks. The number of elements determined task complexity, with each story involving one, two, four, or nine characters (named Tasks 1, 2, 3, and 4, respectively). For the secondary task, participants responded to letter-color changes. Although differences were not significant, it was found that reaction times on the secondary task for Task 4 were longer than those for Task 1, and those for Task 2 were longer than those for Task 3. Results of participants’ time estimations showed that they perceived Task 4 to be more complex than Task 1. However, time estimations for Task 2 were shorter than those for Task 3, suggesting that the former was perceived to be more complex than the latter that involved more characters in the story. With regard to self-assessments, Task 4 was found to be significantly more difficult and to require more mental effort than the other three tasks, and Task 2 was perceived to be more complex than Task 1. Different response patterns were found when comparing high- and low-proficiency groups, suggesting an interaction effect of L2 proficiency, task complexity, and measure of cognitive load. To explain why Task 2 placed greater cognitive demands than Task 3, two possibilities were suggested: storyline and picture quality, and code complexity. The three studies above used a combination of methods to see whether the intended complex tasks were actually cognitively complex. In general, evidence from these measures showed that complex tasks imposed greater cognitive load on the learner. Among the studies, only Révész et al. (2014) further examined learning outcomes, and hypothesized that those who received recasts while performing complex tasks (+reasoning demands) would show greater gains in L2 development than those who received recasts during simple tasks (−reasoning demands). However, counter to the authors’ predictions, there was no significant difference between the two groups in terms of written production, and the simple task group outperformed the complex task group in terms of oral production. Moreover, Révész et al. (2014) and Sasayama (2016) employed only one task-type, which raises questions about the generalizability of the results. Bearing these points in mind, the present study employed a combination of measures to validate task complexity manipulations of three task-types, and investigated the effects of task complexity on native speaker oral production in terms of syntactic complexity, lexical diversity, and lexical sophistication. THE PRESENT STUDY In light of the limitations of previous research, the present work sought to answer four research questions. RQ1. Do task complexity manipulations along the ± number of elements dimension lead to systematic changes in self-ratings of cognitive load? RQ2. Do task complexity manipulations along the ± number of elements dimension lead to systematic changes in duration judgments on tasks? RQ3. Do task complexity manipulations along the ± number of elements dimension lead to systematic changes in dual-task measures? RQ4. Do task complexity manipulations along the ± number of elements dimension lead to systematic changes in native speakers’ oral production? METHOD Participants Forty-two native speakers of English (18 males, 24 females) enrolled at a university in the USA participated in the study. Their ages ranged from 19 to 41 years at the time of study (M = 26.14, SD = 4.646). Tasks Three types of oral tasks were employed to maximize generalizability of findings and avoid type–token confounds. A resource-directing dimension of the CH, ± few elements, was manipulated such that each task-type had three versions of task complexity: least complex, mid-complex, and most-complex. In a MT, the learner had to find the quickest route from one place to another and tell an imaginary friend how to drive there. The number of obstacles on the road (e.g. a no-turn sign, one-way street, closed road, construction site, etc.) was manipulated so that an increasing number of obstacles forced participants to find a more complex route. In a Seating Arrangement task (SAT), the participant had to arrange the best seating plan for a number of people with certain preferences. It was assumed that the greater number of people and preferences would increase task complexity. In a Car Accident task (CAT), the learner watched a video clip of a car accident scene three times and reported the accident by pretending to be a news reporter. The number of cars and people involved in the accident determined the complexity of the task. As much as 2 min were provided as planning time for the MT and CAT, and as much as 5 min for the SAT. Table 1 illustrates in detail the number of elements involved in each task-type and version. Measures of cognitive load After performing each version of a task, participants completed a questionnaire in which nine-point Likert scales were used to answer questions about (i) overall perceived task difficulty, (ii) level of mental effort they thought was required for task performance, and (iii) level of stress they felt during task performance. They were also asked to estimate the time it took for planning and performing the task separately. Although it was assumed that both times would be affected by the complexity of the task, precaution of separating the two was taken to prevent participants’ awareness of preparation time from confounding the results. These time estimations were later used to calculate the ratio of subjective duration to objective duration as one measure of cognitive load. It was in the prospective paradigm, as participants were aware prior to, or immediately upon, performance that they needed to make a duration judgment. As part of the dual-task method, participants performed a secondary simple choice reaction task while simultaneously performing the primary oral task. Screen color changes were employed to ensure that differences in the performance of the secondary task were a reflection of changes at a cognitive level, and not a perceptual level (Cierniak et al. 2009). While participants were performing the primary task, the laptop screen before them changed colors from either white to green or white to red at intervals of 2,500 ms. Participants were required to react as quickly and accurately as possible to the color changes, pressing the left shift key if the screen changed from white to green, and the right shift key if it changed from white to red. The primary and secondary tasks were run through DMDX, and participants’ error rates and reaction times were recorded. Accuracy was calculated by dividing the number of correct responses to color changes by the total number of changes. Only correct responses were considered for reaction time. Procedure Participants met with the researcher individually for a 1-h session. They first completed a language background questionnaire (adapted from Ellis 2011), and then performed a series of practice items whose format was identical to the tasks in the test phase. A sample item of each task-type was provided. To prevent sequencing effects, the order of the tasks was pseudo-randomized such that three blocks containing one version of each task-type were scrambled (see Table 2). While performing the primary oral tasks, participants responded to screen color changes as a secondary task. Following each task version, they completed a questionnaire regarding cognitive load self-ratings and estimated the time they had spent on planning and speech during task performance. Table 1: Number of task elements Task-type Type of element Least complex Mid-complex Most complex MT Obstacles 0 2 4 SAT Guests 4 6 8 CAT Cars and people 1 3 10 Task-type Type of element Least complex Mid-complex Most complex MT Obstacles 0 2 4 SAT Guests 4 6 8 CAT Cars and people 1 3 10 Table 1: Number of task elements Task-type Type of element Least complex Mid-complex Most complex MT Obstacles 0 2 4 SAT Guests 4 6 8 CAT Cars and people 1 3 10 Task-type Type of element Least complex Mid-complex Most complex MT Obstacles 0 2 4 SAT Guests 4 6 8 CAT Cars and people 1 3 10 Table 2: Sample of task randomization Participant Task sequence 1st 2nd 3rd 4th 5th 6th 7th 8th 9th 1 CAT 2 SAT 2 MT 2 SAT 1 MT 1 CAT 3 MT 3 SAT 3 CAT 1 2 SAT 2 MT 2 CAT 1 SAT 1 MT 3 CAT 3 MT 1 CAT 2 SAT 3 3 MT 3 SAT 1 CAT 2 MT 1 CAT 3 SAT 2 MT 2 CAT 1 SAT 3 4 MT 1 SAT 2 CAT 1 SAT 3 CAT 2 MT 2 SAT 1 CAT 3 MT 3 5 MT 3 SAT 3 CAT 3 MT 2 CAT 2 SAT 1 MT 1 CAT 1 SAT 2 6 CAT 1 MT 3 SAT 2 CAT 3 SAT 1 MT 1 CAT 2 MT 2 SAT 3 Participant Task sequence 1st 2nd 3rd 4th 5th 6th 7th 8th 9th 1 CAT 2 SAT 2 MT 2 SAT 1 MT 1 CAT 3 MT 3 SAT 3 CAT 1 2 SAT 2 MT 2 CAT 1 SAT 1 MT 3 CAT 3 MT 1 CAT 2 SAT 3 3 MT 3 SAT 1 CAT 2 MT 1 CAT 3 SAT 2 MT 2 CAT 1 SAT 3 4 MT 1 SAT 2 CAT 1 SAT 3 CAT 2 MT 2 SAT 1 CAT 3 MT 3 5 MT 3 SAT 3 CAT 3 MT 2 CAT 2 SAT 1 MT 1 CAT 1 SAT 2 6 CAT 1 MT 3 SAT 2 CAT 3 SAT 1 MT 1 CAT 2 MT 2 SAT 3 Note: 1 = least complex; 2 = mid-complex; 3 = most complex. Table 2: Sample of task randomization Participant Task sequence 1st 2nd 3rd 4th 5th 6th 7th 8th 9th 1 CAT 2 SAT 2 MT 2 SAT 1 MT 1 CAT 3 MT 3 SAT 3 CAT 1 2 SAT 2 MT 2 CAT 1 SAT 1 MT 3 CAT 3 MT 1 CAT 2 SAT 3 3 MT 3 SAT 1 CAT 2 MT 1 CAT 3 SAT 2 MT 2 CAT 1 SAT 3 4 MT 1 SAT 2 CAT 1 SAT 3 CAT 2 MT 2 SAT 1 CAT 3 MT 3 5 MT 3 SAT 3 CAT 3 MT 2 CAT 2 SAT 1 MT 1 CAT 1 SAT 2 6 CAT 1 MT 3 SAT 2 CAT 3 SAT 1 MT 1 CAT 2 MT 2 SAT 3 Participant Task sequence 1st 2nd 3rd 4th 5th 6th 7th 8th 9th 1 CAT 2 SAT 2 MT 2 SAT 1 MT 1 CAT 3 MT 3 SAT 3 CAT 1 2 SAT 2 MT 2 CAT 1 SAT 1 MT 3 CAT 3 MT 1 CAT 2 SAT 3 3 MT 3 SAT 1 CAT 2 MT 1 CAT 3 SAT 2 MT 2 CAT 1 SAT 3 4 MT 1 SAT 2 CAT 1 SAT 3 CAT 2 MT 2 SAT 1 CAT 3 MT 3 5 MT 3 SAT 3 CAT 3 MT 2 CAT 2 SAT 1 MT 1 CAT 1 SAT 2 6 CAT 1 MT 3 SAT 2 CAT 3 SAT 1 MT 1 CAT 2 MT 2 SAT 3 Note: 1 = least complex; 2 = mid-complex; 3 = most complex. Linguistic outcome measures Participants’ speech production was assessed in terms of syntactic complexity, lexical diversity, and lexical sophistication. Number of clauses per AS-unit, number of subordinate clauses per AS-unit, and mean length of AS-unit (number of words per AS-unit) were used to measure syntactic complexity. Lexical diversity was measured using Guiraud’s (1954) Index of Richness (a mathematical transformation of the type–token ratio that takes text length into consideration), MTLD (a Measure of Textual Lexical Diversity), and VOCD (also known as the D-measure). To assess lexical sophistication, the proportion of academic words in speech was analyzed by investigating frequency bands with the most common 1,000 words (K1), the next common 1,000 words (K2), the academic words of English (the AWL, 550 frequent words in academic texts, Coxhead 2000), and off-list words (the remainder not found on other lists). Two raters independently scored the entirety of the spoken data. Inter-rater reliability (Krippendorff's alpha) was .874 for number of clauses per AS-unit, .708 for number of subordinate clauses per AS-unit, and .885 for mean length of AS-unit, indicating good agreement between the raters. Discrepancies were later reviewed, reconciled, and recoded. The number of types and tokens were counted using Wordsmith, a lexical analysis software. The researcher subsequently used Guiraud’s Index to calculate one measure of lexical diversity. MTLD and VOCD were calculated by using a Web-based software tool called Coh-Metrix (McNamara et al. 2013). VocabProfile (Cobb 2002), a Web-based software tool that performs lexical text analysis, was used to analyze vocabulary frequency bands. Items that were outliers, defined as three SDs from the mean (3 from self-ratings, 10 from duration judgments, 17 from dual-task method results, and 11 from linguistic outcome measures), were detected and excluded from analyses. RESULTS Self-ratings of cognitive load Table 3 shows the descriptive statistics for self-ratings of perceived task difficulty, mental effort, and stress. Regardless of task-type, participants felt that the more complex tasks were more difficult, required more mental effort, and were more stressful. Figure 1 provides visual information of the changes in self-ratings. Table 3: Mean and SD of self-ratings Task-type MT SAT CAT Complexity 1 2 3 1 2 3 1 2 3 Perceived difficulty 3.71 (1.38) 5.02 (1.73) 6.24 (1.49) 4.57 (1.88) 5.88 (1.38) 7.95 (1.09) 4.38 (1.55) 5.17 (1.58) 6.86 (1.37) Mental effort 3.67 (1.52) 5.05 (1.90) 5.86 (1.62) 4.50 (1.81) 5.79 (1.52) 7.75 (1.03) 4.33 (1.68) 5.31 (1.81) 6.62 (1.45) Stress 3.43 (1.38) 4.57 (1.82) 4.88 (1.95) 4.02 (1.98) 4.67 (1.72) 6.55 (1.89) 3.71 (1.40) 4.48 (1.70) 5.40 (1.80) Task-type MT SAT CAT Complexity 1 2 3 1 2 3 1 2 3 Perceived difficulty 3.71 (1.38) 5.02 (1.73) 6.24 (1.49) 4.57 (1.88) 5.88 (1.38) 7.95 (1.09) 4.38 (1.55) 5.17 (1.58) 6.86 (1.37) Mental effort 3.67 (1.52) 5.05 (1.90) 5.86 (1.62) 4.50 (1.81) 5.79 (1.52) 7.75 (1.03) 4.33 (1.68) 5.31 (1.81) 6.62 (1.45) Stress 3.43 (1.38) 4.57 (1.82) 4.88 (1.95) 4.02 (1.98) 4.67 (1.72) 6.55 (1.89) 3.71 (1.40) 4.48 (1.70) 5.40 (1.80) Note: Complexity 1 indicates least complex, 2 mid-complex, and 3 most complex. Table 3: Mean and SD of self-ratings Task-type MT SAT CAT Complexity 1 2 3 1 2 3 1 2 3 Perceived difficulty 3.71 (1.38) 5.02 (1.73) 6.24 (1.49) 4.57 (1.88) 5.88 (1.38) 7.95 (1.09) 4.38 (1.55) 5.17 (1.58) 6.86 (1.37) Mental effort 3.67 (1.52) 5.05 (1.90) 5.86 (1.62) 4.50 (1.81) 5.79 (1.52) 7.75 (1.03) 4.33 (1.68) 5.31 (1.81) 6.62 (1.45) Stress 3.43 (1.38) 4.57 (1.82) 4.88 (1.95) 4.02 (1.98) 4.67 (1.72) 6.55 (1.89) 3.71 (1.40) 4.48 (1.70) 5.40 (1.80) Task-type MT SAT CAT Complexity 1 2 3 1 2 3 1 2 3 Perceived difficulty 3.71 (1.38) 5.02 (1.73) 6.24 (1.49) 4.57 (1.88) 5.88 (1.38) 7.95 (1.09) 4.38 (1.55) 5.17 (1.58) 6.86 (1.37) Mental effort 3.67 (1.52) 5.05 (1.90) 5.86 (1.62) 4.50 (1.81) 5.79 (1.52) 7.75 (1.03) 4.33 (1.68) 5.31 (1.81) 6.62 (1.45) Stress 3.43 (1.38) 4.57 (1.82) 4.88 (1.95) 4.02 (1.98) 4.67 (1.72) 6.55 (1.89) 3.71 (1.40) 4.48 (1.70) 5.40 (1.80) Note: Complexity 1 indicates least complex, 2 mid-complex, and 3 most complex. Figure 1: View largeDownload slide Task complexity increases and self-ratings Figure 1: View largeDownload slide Task complexity increases and self-ratings To answer the first research question, three linear mixed models were run on the self-ratings of cognitive load. The fixed-effects variables were task-type and task complexity, and the random-effects variables were participants and task items. To filter out unnecessarily complicated models, the model with the lowest Schwarz’s Bayesian information criterion (BIC) was chosen out of the whole set of candidates. As a result, the Compound Symmetry model was ultimately selected for all three models. Restricted maximum likelihood estimates were used. In the case of perceived difficulty, significant main effects were found for task-type and task complexity, F(2, 326.774) = 24.119, p < .001 and F(2, 326.774) = 146.665, p < .001, respectively. However, their interaction was not significant, F(4, 326.773) = 2.056, p = .086. Marginal and conditional R2s for the fixed effects were found to be R2GLMMm= .33 and R2GLMMc= .58. Pairwise comparisons showed that when task complexity was factored out, the SAT was considered significantly more difficult than the MT and CAT, and the CAT was perceived to be significantly more difficult than the MT. When task-type was ignored, there were significant differences between each level of complexity. In other words, the most complex task versions were perceived to be the most difficult, followed by the mid-complex, and then the least complex versions. When a mixed model was conducted with mental effort as the outcome variable, a significant interaction between task-type and task complexity was obtained, F(4, 325.762) = 2.414, p = .049, indicating that task complexity had different effects on mental effort depending on task-type. A simple effects test was then conducted, and results showed that the most complex versions of the SAT and the CAT significantly required the most mental effort, followed by the mid-complex versions and then the least complex versions (for the SAT, p < .001 for all comparisons; for the CAT, p < .001 for comparisons between the most complex versions and the other two versions, and p = .003 for the comparison between the mid-complex and least complex versions). In the case of the MT, the least complex version was considered to require significantly less mental effort than the other two more complex versions (p < .001 for the two comparisons). Significant main effects for task-type and task complexity were also found, F(2, 325.763) = 22.561, p < .001 and F(2, 325.763) = 115.517, p < .001, respectively. Pairwise comparisons revealed significant differences between each level of task-type and task complexity. Again, the greatest level of mental effort was required for the most complex task versions and the SAT. The CAT required more effort than the MT, and participants felt they expended more mental effort on the mid-complex task versions than the least complex versions. It was found that marginal and conditional R2s for the fixed effects were R2GLMMm= .28 and R2GLMMc= .57. A mixed model run on stress ratings obtained significant main effects for task-type and task complexity, as well as a significant interaction, F(2, 328.00) = 12.843, p < .001; F(2, 328.00) = 70.812, p < .001; and F(3, 328.00) = 4.445, p < .005, respectively. Marginal and conditional R2s for the fixed effects were found to be R2GLMMm= .15 and R2GLMMc= .60. Because there was a significant interaction effect, a simple effects test was conducted. It was found that the most and mid-complex versions of the MT were significantly more stressful than the least complex version (p < .001 for both comparisons). In the case of the SAT, the most complex version was significantly more stressful than the other two less complex versions (p < .001 for both comparisons). The most complex CAT version was found to be significantly more stressful than the mid-complex (p = .001) and least complex version (p < .001), and the mid-complex version was significantly more stressful than the least complex version (p < .001). When task complexity was ignored, the SAT was significantly more stressful than the MT and CAT. When task-type was factored out, significant differences were found between each level of task complexity, with stress ratings increasing as task complexity increased. Prospective duration judgments Table 4 and Figure 2 display the descriptive statistics and patterns for planning and speech duration judgment ratios. Planning ratios for the CAT showed a very different pattern from those of the MT and SAT, which seemed to be consistent across all task complexity levels. Speech ratios for the CAT were also slightly different, but there appears to be a general pattern in that speech duration judgment ratios decreased as task complexity increased. Table 4: Mean and SD of duration judgment ratios Task-type MT SAT CAT Complexity 1 2 3 1 2 3 1 2 3 Planning (s) 1.25 (0.86) 1.29 (0.70) 1.18 (0.57) 1.06 (0.49) 1.09 (0.44) 0.98 (0.21) 7.47 (8.79) 8.51 (9.44) 10.31 (10.73) Speech (s) 1.59 (0.89) 1.72 (0.95) 1.37 (0.82) 1.52 (0.91) 1.53 (0.79) 1.26 (0.63) 1.86 (1.23) 1.64 (0.94) 1.58 (0.96) Task-type MT SAT CAT Complexity 1 2 3 1 2 3 1 2 3 Planning (s) 1.25 (0.86) 1.29 (0.70) 1.18 (0.57) 1.06 (0.49) 1.09 (0.44) 0.98 (0.21) 7.47 (8.79) 8.51 (9.44) 10.31 (10.73) Speech (s) 1.59 (0.89) 1.72 (0.95) 1.37 (0.82) 1.52 (0.91) 1.53 (0.79) 1.26 (0.63) 1.86 (1.23) 1.64 (0.94) 1.58 (0.96) Note: Complexity 1 indicates least complex, 2 mid-complex, and 3 most complex. Table 4: Mean and SD of duration judgment ratios Task-type MT SAT CAT Complexity 1 2 3 1 2 3 1 2 3 Planning (s) 1.25 (0.86) 1.29 (0.70) 1.18 (0.57) 1.06 (0.49) 1.09 (0.44) 0.98 (0.21) 7.47 (8.79) 8.51 (9.44) 10.31 (10.73) Speech (s) 1.59 (0.89) 1.72 (0.95) 1.37 (0.82) 1.52 (0.91) 1.53 (0.79) 1.26 (0.63) 1.86 (1.23) 1.64 (0.94) 1.58 (0.96) Task-type MT SAT CAT Complexity 1 2 3 1 2 3 1 2 3 Planning (s) 1.25 (0.86) 1.29 (0.70) 1.18 (0.57) 1.06 (0.49) 1.09 (0.44) 0.98 (0.21) 7.47 (8.79) 8.51 (9.44) 10.31 (10.73) Speech (s) 1.59 (0.89) 1.72 (0.95) 1.37 (0.82) 1.52 (0.91) 1.53 (0.79) 1.26 (0.63) 1.86 (1.23) 1.64 (0.94) 1.58 (0.96) Note: Complexity 1 indicates least complex, 2 mid-complex, and 3 most complex. Figure 2: View largeDownload slide Task complexity increases and duration judgment ratios Figure 2: View largeDownload slide Task complexity increases and duration judgment ratios Two separate linear mixed models were conducted, with the Heterogeneous Compound Symmetry model chosen for planning duration judgment ratio, and the Compound Symmetry model selected for speech duration judgment ratio. Again, restricted maximum likelihood estimates were used. With respect to planning, a significant main effect was found for task-type, F(2, 84.981) = 27.505, p < .001. However, the main effect for task complexity and the interaction between the two were not significant, F(2, 77.657) = 1.107, p = .336 and F(4, 98.832) = .638, p = .637, respectively. Marginal and conditional R2s for the fixed effects were R2GLMMm= .21 and R2GLMMc= .38. Pairwise comparisons revealed that the CAT produced a significantly higher ratio than the other two task-types, with the ratio for the MT being significantly higher than that for the SAT. When the speech duration judgment ratio was the outcome variable, significant main effects were found for task-type and task complexity, F(2, 305.487) = 5.158, p < .01 and F(2, 305.487) = 8.600, p < .001, respectively. However, their interaction was non-significant, F(4, 305.624) = 1.314, p = .265. Marginal and conditional R2s for the fixed effects were found to be R2GLMMm= .02 and R2GLMMc= .64. Pairwise comparisons revealed that when task complexity was ignored, the speech ratio for the CAT was significantly higher than that for the SAT. When task-type was factored out, the speech ratio for the most complex task versions was significantly lower than those for the mid-complex and least complex versions. Dual-task outcome measures Descriptive statistics for dual-task outcome measures are displayed in Table 5. Participants’ accuracy seemed to be consistent across all task-types and versions. As illustrated in Figure 3, reaction times appear to have increased with greater task complexity, with the exception of the CAT. Table 5: Mean and standard deviation of dual-task outcome measures Task-type MT SAT CAT Complexity 1 2 3 1 2 3 1 2 3 Accuracy 0.97 (0.07) 0.96 (0.06) 0.96 (0.06) 0.97 (0.08) 0.96 (0.06) 0.95 (0.06) 0.97 (0.07) 0.98 (0.04) 0.95 (0.07) Reaction time (ms) 1,590.27 (921.10) 1,725.23 (839.97) 1,756.67 (761.97) 1,590.92 (941.28) 1,742.63 (720.68) 2,428.02 (1,514.83) 1,385.98 (652.15) 1,185.54 (358.04) 1,448.66 (586.23) Task-type MT SAT CAT Complexity 1 2 3 1 2 3 1 2 3 Accuracy 0.97 (0.07) 0.96 (0.06) 0.96 (0.06) 0.97 (0.08) 0.96 (0.06) 0.95 (0.06) 0.97 (0.07) 0.98 (0.04) 0.95 (0.07) Reaction time (ms) 1,590.27 (921.10) 1,725.23 (839.97) 1,756.67 (761.97) 1,590.92 (941.28) 1,742.63 (720.68) 2,428.02 (1,514.83) 1,385.98 (652.15) 1,185.54 (358.04) 1,448.66 (586.23) Note: Complexity 1 indicates least complex, 2 mid-complex, and 3 most complex. Table 5: Mean and standard deviation of dual-task outcome measures Task-type MT SAT CAT Complexity 1 2 3 1 2 3 1 2 3 Accuracy 0.97 (0.07) 0.96 (0.06) 0.96 (0.06) 0.97 (0.08) 0.96 (0.06) 0.95 (0.06) 0.97 (0.07) 0.98 (0.04) 0.95 (0.07) Reaction time (ms) 1,590.27 (921.10) 1,725.23 (839.97) 1,756.67 (761.97) 1,590.92 (941.28) 1,742.63 (720.68) 2,428.02 (1,514.83) 1,385.98 (652.15) 1,185.54 (358.04) 1,448.66 (586.23) Task-type MT SAT CAT Complexity 1 2 3 1 2 3 1 2 3 Accuracy 0.97 (0.07) 0.96 (0.06) 0.96 (0.06) 0.97 (0.08) 0.96 (0.06) 0.95 (0.06) 0.97 (0.07) 0.98 (0.04) 0.95 (0.07) Reaction time (ms) 1,590.27 (921.10) 1,725.23 (839.97) 1,756.67 (761.97) 1,590.92 (941.28) 1,742.63 (720.68) 2,428.02 (1,514.83) 1,385.98 (652.15) 1,185.54 (358.04) 1,448.66 (586.23) Note: Complexity 1 indicates least complex, 2 mid-complex, and 3 most complex. Figure 3: View largeDownload slide Task complexity increases and dual-task outcomes Figure 3: View largeDownload slide Task complexity increases and dual-task outcomes To see whether task-type and task complexity had an effect on the secondary task outcomes, two separate linear mixed models were conducted with participants and task items as random effects. Using the Schwarz’s BIC, the Compound Symmetry model was ultimately selected for accuracy and the Heterogeneous version of the Compound Symmetry model was chosen for reaction time. Restricted maximum likelihood estimates were utilized. In the case of accuracy, main effects for task complexity, task-type, and their interaction were not significant, F(2, 304.083) = 2.995, p = .051; F(2, 304.083) = .482, p = .618; and F(4, 304.240) = .714, p = .583, respectively. Marginal and conditional R2s for the fixed effects were R2GLMMm= .01 and R2GLMMc= .14. On the other hand, for reaction time, significant effects were found for task-type, task complexity, and their interaction, F(2, 91.743) = 22.341, p < .001; F(2, 90.087) = 13.898, p < .001; and F(4, 125.718) = 4.452, p < .005, respectively. Marginal and conditional R2s for the fixed effects were found to be R2GLMMm= .04 and R2GLMMc= .67. Due to the significant interaction, a simple effects test was conducted. Results showed that the reaction time on the most complex MT was significantly longer than that on the least complex version (p = .013), and the reaction time on the most complex SAT was significantly longer than those on the mid-complex (p = .005) and least complex versions (p < .001). The reaction time on the mid-complex SAT version was also significantly longer than that on the least complex version (p = .027). Regarding the significant main effects, pairwise comparisons revealed that when ignoring task complexity, reaction times were significantly longer on the SAT, followed by the MT and then the CAT. When task-type was factored out, reaction times were significantly longer on the most complex task versions than the two less complex versions. Linguistic outcome measures Descriptive statistics for the six linguistic outcome measures are presented in Table 6. Table 7 displays descriptive statistics for the measure of lexical sophistication. As depicted in Figure 4, there seemed to be a general reverse V-shaped pattern for syntactic complexity outcome measures when task complexity is increased. In other words, the mid-complex task versions seem to have generated the most complex structures. On the other hand, Figure 5 shows that lexical diversity seemed to have increased with greater task complexity. Table 6: Mean and standard deviation of linguistic outcome measures Task-type MT SAT CAT Complexity 1 2 3 1 2 3 1 2 3 Clause per AS-unit 1.80 (0.74) 1.72 (0.43) 1.75 (0.62) 2.65 (0.72) 3.01 (0.86) 2.61 (0.92) 2.31 (0.78) 2.47 (0.86) 2.31 (0.73) Subordinate clause per AS-unit 0.45 (0.50) 0.43 (0.34) 0.47 (0.44) 1.43 (0.61) 1.72 (0.76) 1.41 (0.74) 0.92 (0.63) 1.18 (0.73) 0.89 (0.50) Mean length of AS-unit 14.52 14.28 14.46 21.45 21.20 19.50 22.41 19.58 20.42 (5.42) (4.72) (6.04) (5.52) (6.05) (4.75) (10.29) (6.23) (7.29) Guiraud’s Index 4.72 4.71 4.70 5.25 5.17 5.56 5.35 5.71 5.79 (0.50) (0.47) (0.60) (0.65) (0.79) (0.85) (0.84) (0.71) (0.70) MTLD 32.67 32.14 33.53 39.67 35.85 40.78 48.92 48.10 47.65 (0.95) (0.85) (1.06) (1.32) (1.50) (1.71) (2.49) (2.02) (2.56) VOCD 7.14 16.20 19.92 21.95 24.46 37.51 10.12 16.15 26.57 (1.92) (2.00) (2.09) (3.13) (2.48) (2.04) (3.33) (3.78) (3.86) Task-type MT SAT CAT Complexity 1 2 3 1 2 3 1 2 3 Clause per AS-unit 1.80 (0.74) 1.72 (0.43) 1.75 (0.62) 2.65 (0.72) 3.01 (0.86) 2.61 (0.92) 2.31 (0.78) 2.47 (0.86) 2.31 (0.73) Subordinate clause per AS-unit 0.45 (0.50) 0.43 (0.34) 0.47 (0.44) 1.43 (0.61) 1.72 (0.76) 1.41 (0.74) 0.92 (0.63) 1.18 (0.73) 0.89 (0.50) Mean length of AS-unit 14.52 14.28 14.46 21.45 21.20 19.50 22.41 19.58 20.42 (5.42) (4.72) (6.04) (5.52) (6.05) (4.75) (10.29) (6.23) (7.29) Guiraud’s Index 4.72 4.71 4.70 5.25 5.17 5.56 5.35 5.71 5.79 (0.50) (0.47) (0.60) (0.65) (0.79) (0.85) (0.84) (0.71) (0.70) MTLD 32.67 32.14 33.53 39.67 35.85 40.78 48.92 48.10 47.65 (0.95) (0.85) (1.06) (1.32) (1.50) (1.71) (2.49) (2.02) (2.56) VOCD 7.14 16.20 19.92 21.95 24.46 37.51 10.12 16.15 26.57 (1.92) (2.00) (2.09) (3.13) (2.48) (2.04) (3.33) (3.78) (3.86) Note: Complexity 1 indicates least complex, 2 mid-complex, and 3 most complex. Table 6: Mean and standard deviation of linguistic outcome measures Task-type MT SAT CAT Complexity 1 2 3 1 2 3 1 2 3 Clause per AS-unit 1.80 (0.74) 1.72 (0.43) 1.75 (0.62) 2.65 (0.72) 3.01 (0.86) 2.61 (0.92) 2.31 (0.78) 2.47 (0.86) 2.31 (0.73) Subordinate clause per AS-unit 0.45 (0.50) 0.43 (0.34) 0.47 (0.44) 1.43 (0.61) 1.72 (0.76) 1.41 (0.74) 0.92 (0.63) 1.18 (0.73) 0.89 (0.50) Mean length of AS-unit 14.52 14.28 14.46 21.45 21.20 19.50 22.41 19.58 20.42 (5.42) (4.72) (6.04) (5.52) (6.05) (4.75) (10.29) (6.23) (7.29) Guiraud’s Index 4.72 4.71 4.70 5.25 5.17 5.56 5.35 5.71 5.79 (0.50) (0.47) (0.60) (0.65) (0.79) (0.85) (0.84) (0.71) (0.70) MTLD 32.67 32.14 33.53 39.67 35.85 40.78 48.92 48.10 47.65 (0.95) (0.85) (1.06) (1.32) (1.50) (1.71) (2.49) (2.02) (2.56) VOCD 7.14 16.20 19.92 21.95 24.46 37.51 10.12 16.15 26.57 (1.92) (2.00) (2.09) (3.13) (2.48) (2.04) (3.33) (3.78) (3.86) Task-type MT SAT CAT Complexity 1 2 3 1 2 3 1 2 3 Clause per AS-unit 1.80 (0.74) 1.72 (0.43) 1.75 (0.62) 2.65 (0.72) 3.01 (0.86) 2.61 (0.92) 2.31 (0.78) 2.47 (0.86) 2.31 (0.73) Subordinate clause per AS-unit 0.45 (0.50) 0.43 (0.34) 0.47 (0.44) 1.43 (0.61) 1.72 (0.76) 1.41 (0.74) 0.92 (0.63) 1.18 (0.73) 0.89 (0.50) Mean length of AS-unit 14.52 14.28 14.46 21.45 21.20 19.50 22.41 19.58 20.42 (5.42) (4.72) (6.04) (5.52) (6.05) (4.75) (10.29) (6.23) (7.29) Guiraud’s Index 4.72 4.71 4.70 5.25 5.17 5.56 5.35 5.71 5.79 (0.50) (0.47) (0.60) (0.65) (0.79) (0.85) (0.84) (0.71) (0.70) MTLD 32.67 32.14 33.53 39.67 35.85 40.78 48.92 48.10 47.65 (0.95) (0.85) (1.06) (1.32) (1.50) (1.71) (2.49) (2.02) (2.56) VOCD 7.14 16.20 19.92 21.95 24.46 37.51 10.12 16.15 26.57 (1.92) (2.00) (2.09) (3.13) (2.48) (2.04) (3.33) (3.78) (3.86) Note: Complexity 1 indicates least complex, 2 mid-complex, and 3 most complex. Table 7: Average percentage and standard deviation of word bands Task-type MT SAT CAT Complexity 1 2 3 1 2 3 1 2 3 K1 words 79.25 (5.23) 81.88 (4.07) 82.88 (3.75) 91.01 (4.24) 84.11 (4.37) 88.18 (5.20) 81.19 (5.09) 76.92 (4.98) 77.86 (4.60) K2 words 7.94 (3.59) 8.02 (2.62) 6.29 (2.28) 4.88 (2.19) 10.28 (5.17) 3.10 (2.86) 4.78 (3.47) 8.24 (3.57) 8.22 (2.60) AWL words 0.42 0.23 0.52 0.48 0.68 1.00 1.00 1.40 1.41 (1.19) (0.47) (0.93) (0.83) (1.07) (1.15) (1.58) (2.08) (1.56) Off-list words 12.40 9.86 10.31 3.63 4.93 7.72 13.03 13.45 12.51 (3.60) (3.55) (3.29) (3.30) (3.81) (4.63) (4.83) (4.24) (3.54) Task-type MT SAT CAT Complexity 1 2 3 1 2 3 1 2 3 K1 words 79.25 (5.23) 81.88 (4.07) 82.88 (3.75) 91.01 (4.24) 84.11 (4.37) 88.18 (5.20) 81.19 (5.09) 76.92 (4.98) 77.86 (4.60) K2 words 7.94 (3.59) 8.02 (2.62) 6.29 (2.28) 4.88 (2.19) 10.28 (5.17) 3.10 (2.86) 4.78 (3.47) 8.24 (3.57) 8.22 (2.60) AWL words 0.42 0.23 0.52 0.48 0.68 1.00 1.00 1.40 1.41 (1.19) (0.47) (0.93) (0.83) (1.07) (1.15) (1.58) (2.08) (1.56) Off-list words 12.40 9.86 10.31 3.63 4.93 7.72 13.03 13.45 12.51 (3.60) (3.55) (3.29) (3.30) (3.81) (4.63) (4.83) (4.24) (3.54) Note: Complexity 1 indicates least complex, 2 mid-complex, and 3 most complex. Table 7: Average percentage and standard deviation of word bands Task-type MT SAT CAT Complexity 1 2 3 1 2 3 1 2 3 K1 words 79.25 (5.23) 81.88 (4.07) 82.88 (3.75) 91.01 (4.24) 84.11 (4.37) 88.18 (5.20) 81.19 (5.09) 76.92 (4.98) 77.86 (4.60) K2 words 7.94 (3.59) 8.02 (2.62) 6.29 (2.28) 4.88 (2.19) 10.28 (5.17) 3.10 (2.86) 4.78 (3.47) 8.24 (3.57) 8.22 (2.60) AWL words 0.42 0.23 0.52 0.48 0.68 1.00 1.00 1.40 1.41 (1.19) (0.47) (0.93) (0.83) (1.07) (1.15) (1.58) (2.08) (1.56) Off-list words 12.40 9.86 10.31 3.63 4.93 7.72 13.03 13.45 12.51 (3.60) (3.55) (3.29) (3.30) (3.81) (4.63) (4.83) (4.24) (3.54) Task-type MT SAT CAT Complexity 1 2 3 1 2 3 1 2 3 K1 words 79.25 (5.23) 81.88 (4.07) 82.88 (3.75) 91.01 (4.24) 84.11 (4.37) 88.18 (5.20) 81.19 (5.09) 76.92 (4.98) 77.86 (4.60) K2 words 7.94 (3.59) 8.02 (2.62) 6.29 (2.28) 4.88 (2.19) 10.28 (5.17) 3.10 (2.86) 4.78 (3.47) 8.24 (3.57) 8.22 (2.60) AWL words 0.42 0.23 0.52 0.48 0.68 1.00 1.00 1.40 1.41 (1.19) (0.47) (0.93) (0.83) (1.07) (1.15) (1.58) (2.08) (1.56) Off-list words 12.40 9.86 10.31 3.63 4.93 7.72 13.03 13.45 12.51 (3.60) (3.55) (3.29) (3.30) (3.81) (4.63) (4.83) (4.24) (3.54) Note: Complexity 1 indicates least complex, 2 mid-complex, and 3 most complex. Table 8: Examples of task simplification and proper task completion Task-type Example of task simplification Example of good task completion MT Turn right on Main Street. Drive two blocks. Turn left on Lincoln Avenue. Drive one block. Turn left on Downtown Street? Drive one block. Turn right on Jefferson Avenue? Drive one block? Then turn left on Pine Street? Drive one block? Then turn right on Washington Avenue. The post office will be on your right Okay so to get to the post office you're going to make a right, onto Main Street, which is going to be your first right, because there is road work up ahead. Um after making this right you're going to continue straight passing the pet store supermarket hospital and police station? Because you cannot make a left onto Jefferson Avenue, so once you're on the end of Main Street you're going to make that left onto Lincoln Avenue. And you're going to make your next left that you can, right onto Downtown Street. You will pass a shop, and then you will continue that block and make a right onto Jefferson Avenue? Uh you will see a cinema, if you're going in the correct direction? And you will continue straight until you approach a library, where you will then make a left? And you will see a pizza place and if you continue, you will see the post office on your right, and make that right, across from the playground, and you will find yourself right on Washington Avenue at the post office SAT Okay so head of State A will sit at the bottom right? Um head of State B will sit to the right of head of State A. Uh head of State G will sit right at the bathroom um and on the top right uh head of State E will sit. Um on the left side at the bottom left head of State H will sit at the bottom? Head of State D will sit to the left of head of State H. Um head of State C will sit to the left of head of State D and head of State C. And head of State F will sit to the left of head of State C Alright, so, I guess I will start with the closest to the restroom, which will be G because they aren't feeling well and wanna sit close to the restroom. And going around to the left, will be A, who is sitting here because D is on the other side of the room and they're at war with them and don't wanna sit near them. To the left of A, is um B, and they are sitting far enough away from E, where they won't bother each other because they're at war with each other. Uh to the left of B is H, who is going to be late and they wanna be able to sneak in. To the left of H is D, who is at war with A so they're far apart from each other. And D also does not like meat, they will be eating seafood. Uh to the left of D is C, who wanted to be seated next to uh someone they didn't wanna be seated next to a woman who aren't their wives, so they're between two men, which leaves to their left who is F, which is a man, and also wanted uh seafood and wanted to strengthen their terms with C so this works out in everyone's favor? And then lastly is E, to the left of them, who enjoy meat, and yep, and are at war with B. So that's why they wanna sit far away from them CAT Three vehicles are sliding and spinning down a steep snow-covered hill, a red car, a red SUV, and a white truck. The SUV collides with the white car on the side of the road, and comes to a stop. Moments later, the SUV is struck by a large red four by four, quite severely. The other two vehicles have escaped with minor damages In this scene, we see an icy road condition uh on a hill, which was obviously not good enough to- for some of these cars to drive. Now, we see three cars sliding down a icy road hill. Um there's a white pickup truck there's a black hatchback and another dark colored sedan. The- um the scene pans out and we see that the sedan hits a white car that is stopped, and we also begin to see four other cars on the scene. Immediately, and with accelerated pace, we then see a red SUV sprinting down the icy road hill. And- and that hits the black hatchback car that we see at the beginning of the screen. The footage shows the other cars that are stopped stationary, most likely unable to drive in these treacherous conditions. However at the same time we do see a Fedex truck that's able to cruise by. Um there's also a few people on the scene we assume that they are the drivers that are unable to uh stay in their car and they wanna stay safe, because of the cars that are sliding down um the icy road hill. Um meanwhile um these people are closing the doors and trying to see if their cars are okay, but these are just not conditions that we should be driving in Task-type Example of task simplification Example of good task completion MT Turn right on Main Street. Drive two blocks. Turn left on Lincoln Avenue. Drive one block. Turn left on Downtown Street? Drive one block. Turn right on Jefferson Avenue? Drive one block? Then turn left on Pine Street? Drive one block? Then turn right on Washington Avenue. The post office will be on your right Okay so to get to the post office you're going to make a right, onto Main Street, which is going to be your first right, because there is road work up ahead. Um after making this right you're going to continue straight passing the pet store supermarket hospital and police station? Because you cannot make a left onto Jefferson Avenue, so once you're on the end of Main Street you're going to make that left onto Lincoln Avenue. And you're going to make your next left that you can, right onto Downtown Street. You will pass a shop, and then you will continue that block and make a right onto Jefferson Avenue? Uh you will see a cinema, if you're going in the correct direction? And you will continue straight until you approach a library, where you will then make a left? And you will see a pizza place and if you continue, you will see the post office on your right, and make that right, across from the playground, and you will find yourself right on Washington Avenue at the post office SAT Okay so head of State A will sit at the bottom right? Um head of State B will sit to the right of head of State A. Uh head of State G will sit right at the bathroom um and on the top right uh head of State E will sit. Um on the left side at the bottom left head of State H will sit at the bottom? Head of State D will sit to the left of head of State H. Um head of State C will sit to the left of head of State D and head of State C. And head of State F will sit to the left of head of State C Alright, so, I guess I will start with the closest to the restroom, which will be G because they aren't feeling well and wanna sit close to the restroom. And going around to the left, will be A, who is sitting here because D is on the other side of the room and they're at war with them and don't wanna sit near them. To the left of A, is um B, and they are sitting far enough away from E, where they won't bother each other because they're at war with each other. Uh to the left of B is H, who is going to be late and they wanna be able to sneak in. To the left of H is D, who is at war with A so they're far apart from each other. And D also does not like meat, they will be eating seafood. Uh to the left of D is C, who wanted to be seated next to uh someone they didn't wanna be seated next to a woman who aren't their wives, so they're between two men, which leaves to their left who is F, which is a man, and also wanted uh seafood and wanted to strengthen their terms with C so this works out in everyone's favor? And then lastly is E, to the left of them, who enjoy meat, and yep, and are at war with B. So that's why they wanna sit far away from them CAT Three vehicles are sliding and spinning down a steep snow-covered hill, a red car, a red SUV, and a white truck. The SUV collides with the white car on the side of the road, and comes to a stop. Moments later, the SUV is struck by a large red four by four, quite severely. The other two vehicles have escaped with minor damages In this scene, we see an icy road condition uh on a hill, which was obviously not good enough to- for some of these cars to drive. Now, we see three cars sliding down a icy road hill. Um there's a white pickup truck there's a black hatchback and another dark colored sedan. The- um the scene pans out and we see that the sedan hits a white car that is stopped, and we also begin to see four other cars on the scene. Immediately, and with accelerated pace, we then see a red SUV sprinting down the icy road hill. And- and that hits the black hatchback car that we see at the beginning of the screen. The footage shows the other cars that are stopped stationary, most likely unable to drive in these treacherous conditions. However at the same time we do see a Fedex truck that's able to cruise by. Um there's also a few people on the scene we assume that they are the drivers that are unable to uh stay in their car and they wanna stay safe, because of the cars that are sliding down um the icy road hill. Um meanwhile um these people are closing the doors and trying to see if their cars are okay, but these are just not conditions that we should be driving in Note: ? indicates rising intonation, not necessarily a question. . indicates falling intonation, end of utterance. , indicates low-rising intonation that suggests continuation. - indicates false-start, self-correction, or self-interruption. Table 8: Examples of task simplification and proper task completion Task-type Example of task simplification Example of good task completion MT Turn right on Main Street. Drive two blocks. Turn left on Lincoln Avenue. Drive one block. Turn left on Downtown Street? Drive one block. Turn right on Jefferson Avenue? Drive one block? Then turn left on Pine Street? Drive one block? Then turn right on Washington Avenue. The post office will be on your right Okay so to get to the post office you're going to make a right, onto Main Street, which is going to be your first right, because there is road work up ahead. Um after making this right you're going to continue straight passing the pet store supermarket hospital and police station? Because you cannot make a left onto Jefferson Avenue, so once you're on the end of Main Street you're going to make that left onto Lincoln Avenue. And you're going to make your next left that you can, right onto Downtown Street. You will pass a shop, and then you will continue that block and make a right onto Jefferson Avenue? Uh you will see a cinema, if you're going in the correct direction? And you will continue straight until you approach a library, where you will then make a left? And you will see a pizza place and if you continue, you will see the post office on your right, and make that right, across from the playground, and you will find yourself right on Washington Avenue at the post office SAT Okay so head of State A will sit at the bottom right? Um head of State B will sit to the right of head of State A. Uh head of State G will sit right at the bathroom um and on the top right uh head of State E will sit. Um on the left side at the bottom left head of State H will sit at the bottom? Head of State D will sit to the left of head of State H. Um head of State C will sit to the left of head of State D and head of State C. And head of State F will sit to the left of head of State C Alright, so, I guess I will start with the closest to the restroom, which will be G because they aren't feeling well and wanna sit close to the restroom. And going around to the left, will be A, who is sitting here because D is on the other side of the room and they're at war with them and don't wanna sit near them. To the left of A, is um B, and they are sitting far enough away from E, where they won't bother each other because they're at war with each other. Uh to the left of B is H, who is going to be late and they wanna be able to sneak in. To the left of H is D, who is at war with A so they're far apart from each other. And D also does not like meat, they will be eating seafood. Uh to the left of D is C, who wanted to be seated next to uh someone they didn't wanna be seated next to a woman who aren't their wives, so they're between two men, which leaves to their left who is F, which is a man, and also wanted uh seafood and wanted to strengthen their terms with C so this works out in everyone's favor? And then lastly is E, to the left of them, who enjoy meat, and yep, and are at war with B. So that's why they wanna sit far away from them CAT Three vehicles are sliding and spinning down a steep snow-covered hill, a red car, a red SUV, and a white truck. The SUV collides with the white car on the side of the road, and comes to a stop. Moments later, the SUV is struck by a large red four by four, quite severely. The other two vehicles have escaped with minor damages In this scene, we see an icy road condition uh on a hill, which was obviously not good enough to- for some of these cars to drive. Now, we see three cars sliding down a icy road hill. Um there's a white pickup truck there's a black hatchback and another dark colored sedan. The- um the scene pans out and we see that the sedan hits a white car that is stopped, and we also begin to see four other cars on the scene. Immediately, and with accelerated pace, we then see a red SUV sprinting down the icy road hill. And- and that hits the black hatchback car that we see at the beginning of the screen. The footage shows the other cars that are stopped stationary, most likely unable to drive in these treacherous conditions. However at the same time we do see a Fedex truck that's able to cruise by. Um there's also a few people on the scene we assume that they are the drivers that are unable to uh stay in their car and they wanna stay safe, because of the cars that are sliding down um the icy road hill. Um meanwhile um these people are closing the doors and trying to see if their cars are okay, but these are just not conditions that we should be driving in Task-type Example of task simplification Example of good task completion MT Turn right on Main Street. Drive two blocks. Turn left on Lincoln Avenue. Drive one block. Turn left on Downtown Street? Drive one block. Turn right on Jefferson Avenue? Drive one block? Then turn left on Pine Street? Drive one block? Then turn right on Washington Avenue. The post office will be on your right Okay so to get to the post office you're going to make a right, onto Main Street, which is going to be your first right, because there is road work up ahead. Um after making this right you're going to continue straight passing the pet store supermarket hospital and police station? Because you cannot make a left onto Jefferson Avenue, so once you're on the end of Main Street you're going to make that left onto Lincoln Avenue. And you're going to make your next left that you can, right onto Downtown Street. You will pass a shop, and then you will continue that block and make a right onto Jefferson Avenue? Uh you will see a cinema, if you're going in the correct direction? And you will continue straight until you approach a library, where you will then make a left? And you will see a pizza place and if you continue, you will see the post office on your right, and make that right, across from the playground, and you will find yourself right on Washington Avenue at the post office SAT Okay so head of State A will sit at the bottom right? Um head of State B will sit to the right of head of State A. Uh head of State G will sit right at the bathroom um and on the top right uh head of State E will sit. Um on the left side at the bottom left head of State H will sit at the bottom? Head of State D will sit to the left of head of State H. Um head of State C will sit to the left of head of State D and head of State C. And head of State F will sit to the left of head of State C Alright, so, I guess I will start with the closest to the restroom, which will be G because they aren't feeling well and wanna sit close to the restroom. And going around to the left, will be A, who is sitting here because D is on the other side of the room and they're at war with them and don't wanna sit near them. To the left of A, is um B, and they are sitting far enough away from E, where they won't bother each other because they're at war with each other. Uh to the left of B is H, who is going to be late and they wanna be able to sneak in. To the left of H is D, who is at war with A so they're far apart from each other. And D also does not like meat, they will be eating seafood. Uh to the left of D is C, who wanted to be seated next to uh someone they didn't wanna be seated next to a woman who aren't their wives, so they're between two men, which leaves to their left who is F, which is a man, and also wanted uh seafood and wanted to strengthen their terms with C so this works out in everyone's favor? And then lastly is E, to the left of them, who enjoy meat, and yep, and are at war with B. So that's why they wanna sit far away from them CAT Three vehicles are sliding and spinning down a steep snow-covered hill, a red car, a red SUV, and a white truck. The SUV collides with the white car on the side of the road, and comes to a stop. Moments later, the SUV is struck by a large red four by four, quite severely. The other two vehicles have escaped with minor damages In this scene, we see an icy road condition uh on a hill, which was obviously not good enough to- for some of these cars to drive. Now, we see three cars sliding down a icy road hill. Um there's a white pickup truck there's a black hatchback and another dark colored sedan. The- um the scene pans out and we see that the sedan hits a white car that is stopped, and we also begin to see four other cars on the scene. Immediately, and with accelerated pace, we then see a red SUV sprinting down the icy road hill. And- and that hits the black hatchback car that we see at the beginning of the screen. The footage shows the other cars that are stopped stationary, most likely unable to drive in these treacherous conditions. However at the same time we do see a Fedex truck that's able to cruise by. Um there's also a few people on the scene we assume that they are the drivers that are unable to uh stay in their car and they wanna stay safe, because of the cars that are sliding down um the icy road hill. Um meanwhile um these people are closing the doors and trying to see if their cars are okay, but these are just not conditions that we should be driving in Note: ? indicates rising intonation, not necessarily a question. . indicates falling intonation, end of utterance. , indicates low-rising intonation that suggests continuation. - indicates false-start, self-correction, or self-interruption. Figure 4: View largeDownload slide Task complexity increases and syntactic complexity outcome measures Figure 4: View largeDownload slide Task complexity increases and syntactic complexity outcome measures Figure 5: View largeDownload slide Task complexity increases and lexical diversity outcome measures Figure 5: View largeDownload slide Task complexity increases and lexical diversity outcome measures Linear mixed models were run for each outcome measure, with task-type and task complexity as fixed effects, and participants and task items as random effects. Finding the smallest value of the Schwarz’s BIC, the Compound Symmetry model was ultimately chosen for the number of clauses per AS-unit and Guiraud’s Index, and the Heterogeneous version of the Compound Symmetry model for the number of subordinate clauses per AS-unit, mean length of AS-unit, MTLD, and VOCD. Restricted maximum likelihood estimates were utilized. Results of a mixed model conducted on number of clauses per AS-unit revealed a significant main effect for task-type, F(2, 310.974) = 75.060, p < .001. The main effect for task complexity and its interaction with task-type were not significant, F(2, 310.974) = 2.711, p = .068 and F(4, 310.973) = 1.542, p = .190, respectively. It was found that marginal and conditional R2s for the fixed effects were R2GLMMm= .08 and R2GLMMc= .51. The SAT elicited significantly more complex structures than the other two task-types, and the CAT elicited significantly more complex structures than the MT. In the case of the number of subordinate clauses per AS-unit, significant main effects were found for task-type and task complexity, F(2, 190.964) = 124.212, p < .001 and F(2, 207.378) = 4.10, p < .05, respectively. However, their interaction was not significant, F(4, 164.507) = 2.407, p = .052. Marginal and conditional R2s for the fixed effects were found to be R2GLMMm= .07 and R2GLMMc= .60. Ignoring task complexity, the SAT produced significantly more complex structures than the other two task-types, and the CAT generated significantly more complex structures than the MT. When task-type was factored out, the mid-complex task versions elicited significantly more complex structures than the most and least complex versions. When a mixed model was run on mean length of AS-unit, a significant main effect for task-type was found, F(2, 140.08) = 68.211, p < .001, indicating that the SAT and the CAT elicited significantly more words per AS-unit than the MT. However, the main effect for task complexity and the task-type*task complexity interaction were not significant, F(2, 222.065) = 1.626, p = .199 and F(4, 146.918) = .991, p = .414, respectively. Marginal and conditional R2s for the fixed effects were found to be R2GLMMm= .14 and R2GLMMc= .42. Results of a mixed model conducted on lexical diversity in terms of Guiraud’s Index showed significant effects for task-type, task complexity, and their interaction, F(2, 312.00) = 80.092, p < .001; F(2, 312.00) = 5.823, p < .005; and F(4, 312.00) = 3.272, p < .05, respectively. Marginal and conditional R2s for the fixed effects were R2GLMMm= .24 and R2GLMMc= .51. A significant interaction indicated that task complexity effects differed depending on task-type. A simple effects test was conducted, and it was found that the most complex SAT version elicited significantly more diverse vocabulary than the mid-complex (p = .034) and least complex versions (p = .027). In the case of the CAT, the most and mid-complex versions elicited significantly more diverse vocabulary than the least complex version (p = .004 and p = .002, respectively). Pairwise comparisons revealed that the CAT produced significantly more diverse vocabulary, followed by the SAT, and then the MT. When task-type was ignored, the most complex task versions produced significantly more diverse vocabulary than the least complex versions. In the case of MTLD, a significant main effect was found for task-type, F(2, 159.918) = 61.705, p < .001, but not for task complexity and their interaction, F(2, 142.394) = 1.355, p = .261 and F(4, 109.413) = .843, p = .501, respectively. Marginal and conditional R2s for the fixed effects were found to be R2GLMMm= .25 and R2GLMMc= .30. Pairwise comparisons between task-types showed that the CAT elicited significantly greater lexical diversity than the SAT and the MT, and the SAT elicited significantly greater lexical diversity than the MT. Results of a mixed model conducted on VOCD revealed significant main effects for task-type and task complexity F(2, 131.692) = 38.770, p < .001 and F(2, 191.016) = 32.282, p < .001, respectively, but a non-significant interaction, F(4, 133.948) = 1.890, p = .116. Marginal and conditional R2s for the fixed effects were R2GLMMm= .09 and R2GLMMc= .48. When task complexity was factored out, the SAT elicited significantly more diverse vocabulary than the MT and the CAT (p < .001 for both comparisons). When task-type was ignored, the most complex versions elicited significantly greater diverse vocabulary, followed by the mid-complex versions, and then the least complex versions (p < .001 for comparisons between the most complex and mid- or least complex versions, and p = .007 for the comparison between the mid-complex and least complex versions). To find out whether increases in task complexity affected the proportion of academic words (AWL) in speech, a 3 × 3 repeated measures analysis of variance was computed with task-type and task complexity as the within-subjects variable. Because Mauchly's Test of Sphericity indicated that the assumption of sphericity had been violated for task-type, χ2(2) = .711, p = .002, a Huynh–Feldt correction was used. Results revealed a significant main effect for task-type, F(1.604, 62.545) = 14.107, p < .001, partial η2 = .266, but a non-significant main effect for task complexity and their interaction, F(2, 78) = 2.660, p = .076, partial η2 = .064 and F(4, 156) = .705, p = .590, partial η2 = .018, respectively. Pairwise comparisons showed that participants produced significantly more academic words during the CAT than the SAT and MT (p < .001 and p = .013, respectively), and produced significantly more academic words during the SAT than the MT (p = .020). DISCUSSION RQ1. Do task complexity manipulations along the ± number of elements dimension lead to systematic changes in self-ratings of cognitive load? The study investigated whether increasing task complexity led to changes in self-ratings of perceived difficulty, mental effort, and stress, which were hypothesized to increase as task complexity increased. These predictions were borne out, as the results of statistical analyses showed a positive linear relationship between task complexity and these outcome measures. Significant interactions between task complexity and task-type were found for mental effort and stress, indicating that increases in task complexity had differential effects on these variables depending on task-type. Moreover, certain task-types had differential effects on the learner such that the SAT imposed greater cognitive load on the learner, followed by the CAT, and then the MT. The slightly different, but considerably similar, results indicate that perceived difficulty, mental effort, and stress are different constructs that should be investigated separately as measures of cognitive load. Furthermore, significant interactions between task-type and task complexity suggest that learner perceptions vary depending on the type of task they perform. Although all three tasks employed in the study were able to show that the tasks intended to be complex were perceived as such, the most and mid-complex MT versions did not differ significantly in terms of mental effort and stress ratings. This may be due to the low number of obstacles that were increased in the MT—if there were more obstacles used in the most complex version, there may have been a clearer linear relationship between task complexity and self-ratings of cognitive load. RQ2. Do task complexity manipulations along the ± number of elements dimension lead to systematic changes in duration judgments on tasks? The present study is one of the few in SLA research that utilized prospective duration judgments to measure cognitive load. According to attentional models in psychology, time estimations are determined by the amount of attention allocated to the processing of temporal information. In the prospective paradigm, attention is assumed to be shared by a non-temporal information processor and a temporal information processor (Block 1992). Although it is claimed that the two processors focus on different stimuli, many studies have found that some of the same attentional resources are required for processing both temporal and non-temporal information. Accordingly, it is predicted that fewer attentional resources may be allocated to temporal information when the non-temporal processing load is increased (Block 2003). Therefore, a negative linear relationship between prospective judgment length and load of non-temporal information processing is assumed. In this study, the ratios of participants’ time estimations to the actual time of planning and speech were used to compare the effects of task complexity across three task-types. It was predicted that the duration judgment ratio would decrease as task complexity increased. Results obtained from statistical analyses on speech duration estimations were consistent with this prediction. Relative to less complex versions, the most complex versions of tasks increased the load of non-temporal processing, resulting in a reallocation of attentional resources from temporal information and a decrease in speech time estimates. However, this negative relationship was not found in the case of planning-time estimations. One possible explanation could be participants’ awareness of the upper limit on planning time for each task-type: up to 2 min for the MT and CAT, and up to 5 min for the SAT. The computer beeped and the screen changed when time was up, providing additional temporal information to participants. Some of the participants knew or could guess the approximate amount of time they spent on planning, and this may account for the non-significant relationship between task complexity and planning judgment ratios. Significant effects were found for task-type on planning and speech estimations, with the highest ratios obtained for the CAT. When preparing for the task, the SAT imposed the greatest load of information-processing, followed by the MT and then the CAT. While participants were carrying out the primary task, the CAT was found to be significantly easier for them to process than the SAT. This difference between the CAT and the SAT can be explained by the nature of the tasks. The CAT simply required participants to remember and report details of a car accident video clip, while the SAT required them to arrange the best seating plan and provide reasons for their choices. There were even a few instances where participants would rearrange the plan they had originally designed while speaking. Such findings show that certain task-types may be perceived to be more complex than others. RQ3. Do task complexity manipulations along the ± number of elements dimension lead to systematic changes in dual-task measures? The dual-task method adopted in this study required participants to respond to screen color changes while simultaneously carrying out the primary task. Task complexity was predicted to affect performance on the dual task such that the number of correct responses would decrease and reaction time would increase as task complexity increased. Results of the statistical analyses were consistent with these predictions: the reaction time on the most complex versions was significantly longer than those on the mid- and least complex versions. Furthermore, a significant interaction between task-type and task complexity showed that for the MT and the SAT, the most complex versions placed greater cognitive load onto the learner than the least complex version. In fact, the SAT functioned the best at capturing cognitive complexity differences between the three task versions. Similar to the findings regarding participant self-ratings, task complexity effects were found to be moderated by the types of tasks employed. These findings are slightly different from those of Révész et al. (2014) and Révész et al. (2015). In their studies, accuracy rates decreased when participants performed the complex task versions, but no significant difference was found for reaction time. They concluded that accuracy is a more sensitive measure of cognitive load than reaction time for dual-task methods using screen color changes. In contrast, the present study found stronger effects for reaction time than accuracy. A possible explanation may lie in the difference in the way participants responded to color changes. In the present study, they had the option of pressing one of two keys. The two earlier studies required them to press a key or ignore the color changes. With the additional option in the present study, ‘the required level of interference’ (Révész et al. 2015, p. 29) may have been created, so that reaction time could capture the cognitive load of the primary task. Furthermore, the participants in the present study were required to click on the left shift key in response to screen color changes from white to green, and the right shift key in response to color changes from white to red. The selection of the left and right shift keys for the color changes was deliberate because it was considered to be counterintuitive: normally, green means ‘go’, which is usually associated with a right-side key, and red means ‘stop’, which is usually associated with a left-side key. Such key-color assignments may have been complex enough to capture different reaction times according to the complexity of the primary task. RQ4. Do task complexity manipulations along the ± number of elements dimension lead to systematic changes in native speakers’ oral production? So far, findings provide support for the validity of task complexity manipulations, in that complex task versions placed greater cognitive load on participants than the simpler versions, as indicated by self-ratings of cognitive load, prospective time judgments, and dual-task method outcomes. Another purpose of the study was to find out whether such manipulations had a positive influence on the syntactic complexity, lexical diversity, and lexical sophistication of participants’ oral production. Although significant task complexity effects were not found for the number of clauses per-AS unit and mean length of AS-unit, participants were found to produce a greater number of subordinate clauses per-AS unit when performing the mid-complex task versions than when performing the most complex and least complex versions. Moreover, the SAT generally elicited the most complex speech, followed by the CAT, and then the MT. Different patterns were found in the case of lexical diversity. Measures of Guiraud’s Index and VOCD showed that the most lexically diverse speech was produced during performance of the most complex task versions. Task complexity effects on Guiraud’s Index were also mediated by task-type, such that certain tasks showed greater complexity effects than others—the most complex versions of the SAT and the CAT were significantly better at eliciting the most diverse vocabulary. All measures of lexical diversity showed that vocabulary was affected by task-type such that the CAT elicited the greatest lexical diversity, followed by the SAT, and then the MT. Participants also produced the highest proportion of academic words while performing the CAT, most likely due to the fact that the task required them to pretend to be a news reporter, thus driving them to use language that is more formal and less casual than giving directions to a friend as in the MT. Although a linear pattern was shown for lexical diversity, a reverse V-shaped pattern was found for the number of subordinate clauses per AS-unit. Unlike the original prediction, the mid-complex task versions elicited the most syntactically complex structures. The possibilities of the mid-complex versions actually being most complex or the most complex versions not being complex were ruled out due to the validating findings yielded by the cognitive load measures. A possible explanation is that participants perceived the most complex task versions to be so complex that they short-circuited the task and simplified it, either intentionally ignoring the added elements or unintentionally not being able to notice them. Regardless of the reason behind this task simplification, participants are still able to complete the task in a minimally satisfactory manner and move on to the next one. For the MT, many participants did not give explanations behind their choices of routes at all levels of task complexity. For the SAT, many did not provide reasons for their decisions on the most complex version and merely produced simple sentences. For the CAT, perhaps due to the low resolution of the video clip, many did not mention each vehicle in detail and simply ignored some of the added elements in the most complex version. Consequently, the most complex task versions failed to generate the most complex syntactic structures. In fact, they produced speech of lower syntactic complexity than the mid-complex versions. Both for laboratory research and classroom teaching materials, it is clearly going to be necessary to design tasks cleverly enough to prevent participants from ignoring elements in more complex versions, perhaps by building in task-internal feedback loops. Table 8 shows examples of task simplification and examples of proper task completion. Task-type was found to have a differential effect on participants, with the SAT eliciting the most complex structures, followed by the CAT and the MT. Whereas the less effective tasks set up a situation in which participants provided instructions (MT) or gave a report (CAT) to an unseen, imaginary audience, the SAT did not involve such an audience and was similar to a think-aloud task. The non-existence/existence of an audience may be a potential reason behind the differences in task-type effectiveness. In a controlled monologic task, such as the SAT, the exclusion of the possibility of online feedback from a live interlocutor that is typical in interactive tasks, for example, expressions of comprehension or lack thereof, may also result in potentially important differences in task-type effects. Conclusion and limitations In TBLT, it is crucial to identify principled criteria with which task-types can be classified and pedagogic tasks should be sequenced. The literature on the impact of task complexity on linguistic CAF measures is extensive, but results of empirical studies have been inconsistent and sometimes contradictory. To find out whether task manipulations actually lead to cognitive load changes, which in turn are assumed to produce positive changes in language production, a combination of self-ratings of cognitive load, time estimations, and the dual-task method were employed in this study. Findings provide support for the claim that increasing task complexity leads to systematic changes in cognitive load. In addition, task-type was found to play a significant role in the effects of task complexity manipulations. However, the most complex versions of tasks failed to elicit the most complex linguistic speech. Bearing that in mind, future studies might investigate the characteristics of task-types that are most effective and look into how feedback can be built into materials in such a way that participants will not be able to simplify complex tasks. More insight on these issues could be key factors in task classification and sequencing in TBLT and other kinds of communicative language teaching. The present study was small-scale, in a laboratory setting, with 42 native English speakers as participants. Native speakers were recruited so as to obtain clear evidence, unfiltered through non-native competence, that task complexity effects are real before assessing their effects on L2 performance. Various learner characteristics, such as L2 proficiency, linguistic aptitude, and working memory, may be important moderators of task complexity effects on L2 production. Once the increasing complexity of a set of tasks has been established empirically, a second phase of work can begin with L2 learners. Future research should compare the baseline performance of native speakers with that of L2 learners, thereby obtaining more confidence in the validity of task complexity manipulations and the effects of task complexity on language development. In this respect, this study should be regarded as the initial phase of a larger study to be conducted in the future. The present study contains several methodological weaknesses that should be addressed in further research. First, all participants were subjected to the dual-task methodology, which may have imposed an extra load onto them. It would have been better to have had separate groups in the study—those who participated in the dual-task method, and those who did not. Furthermore, although extra care was taken when operationalizing task complexity so that only ±number of elements were manipulated, there is a possibility that reasoning demands may have been affected as a result. There is a lack of clear guidance on how to operationalize task complexity in previous research, and ± number of elements and ±Here-and-now are among the easiest to manipulate, considering that their names are fairly self-explanatory. Nonetheless, extra cautionary steps should be taken when designing tasks and manipulating task complexity because other dimensions may be affected as well. SUPPLEMENTARY DATA Supplementary material is available at Applied Linguistics online. Jiyong Lee is a PhD student in the Second Language Acquisition program at the University of Maryland. Her research interests include task complexity effects on L2 performance, the relationships among task complexity, language aptitude, working memory, negative feedback, and age affects and maturational constraints in SLA. Address for correspondence: Jiyong Lee, University of Maryland 1102 Francis Scott Key Hall College Park, MD 20742, USA. <jlee0123@umd.edu> Acknowledgements The author would like to express her greatest appreciation to Dr Michael Long for his valuable suggestions and guidance through the planning and development of this research work. The author would also like to thank Dr Steven Ross and Dr Dan McNeish for their advice on data analyses. The author’s gratitude also extends to Dr Nan Jiang for his support for this project. Special thanks to several colleagues and friends at the University of Maryland who provided valuable feedback. The author is also grateful to the anonymous reviewers for their efforts and insightful suggestions. References Baralt M. L. 2013 . ‘ The impact of cognitive complexity on feedback efficacy during online versus face-to-face interactive tasks ,’ Studies of Second Language Acquisition 35 : 689 – 725 . Google Scholar CrossRef Search ADS Block R. A. 1992 . ‘Prospective and retrospective duration judgment: The role of information processing and memory’ in Macar F. , Pouthas F. , Friedman W. J. (eds): Time, Action and Cognition: Towards Bridging the Gap . Springer Science & Business Media . Block R. A. 2003 . ‘Psychological timing without a timer: The roles of attention and memory’ in Helfrich H. (ed.): Time and Mind II . Hogrefe Publishing . Block R. A. , Hancock P. A. , Zakay D. . 2010 . ‘ How cognitive load affects duration judgments: A meta-analytic review ,’ Acta Psychologica 134 : 330 – 43 . Google Scholar CrossRef Search ADS PubMed Block R. A. , Zakay D. . 2008 . ‘Timing and remembering the past, the present, and the future’ in Grondin S. (ed): Psychology of Time . Emerald Group Publishing Ltd . Brown S. W. 1985 . ‘ Time perception and attention: The effects of prospective versus retrospective paradigms and task demands on perceived duration ,’ Perception and Psychophysics 38 : 115 – 24 . Google Scholar CrossRef Search ADS PubMed Brunken R. , Plass J. L. , Leutner D. . 2003 . ‘ Direct measurement of cognitive load in multimedia learning ,’ Educational Psychologist 38 : 53 – 61 . Google Scholar CrossRef Search ADS Cierniak G. , Scheiter K. , Gerjets P. . 2009 . ‘ Explaining the split-attention effect: Is the reduction of extraneous cognitive load accompanied by an increase in germane cognitive load? ,’ Computers in Human Behavior 25 : 315 – 24 . Google Scholar CrossRef Search ADS Cobb T. 2002 . ‘Web Vocabprofile [, An Adaptation of Heatley, Nation & Coxhead's (2002) Range],’ available at http://www.lextutor.ca/vp/. Accessed August 2017. Coxhead A. 2000 . ‘ A new academic word list ,’ TESOL Quarterly 34 : 213 – 38 . Google Scholar CrossRef Search ADS Ellis D. 2011 . ‘The role of task complexity in the linguistic complexity of native speaker output,’ Qualifying paper, PhD in Second Language Acquisition Program. University of Maryland. Foster P. , Skehan P. . 1996 . ‘ The influence of planning and task type on second language performance ,’ Studies in Second Language Acquisition 18 : 299 – 323 . Google Scholar CrossRef Search ADS Foster P. , Tavakoli P. . 2009 . ‘ Native speakers and task performance: Comparing effects on complexity, fluency, and lexical diversity ,’ Language learning 59 : 866 – 96 . Google Scholar CrossRef Search ADS Gilabert R. 2007 . ‘ Effects of manipulating task complexity on self-repairs during L2 oral production ,’ International Review of Applied Linguistics in Language Teaching 45 : 215 – 40 . Google Scholar CrossRef Search ADS Gilabert R. , Barón J. , Levkina M. . 2011 . ‘Manipulating task complexity across task types and modes’ in Robinson P. (ed.): Second Language Task Complexity: Researching the Cognition Hypothesis of Language Learning and Performance . John Benjamins Publishing Company . Gilabert R. , Barón J. , Llanes À. . 2009 . ‘ Manipulating cognitive complexity across task types and its impact on learners' interaction during oral performance ,’ International Review of Applied Linguistics in Language Teaching 47 : 367 – 95 . Google Scholar CrossRef Search ADS Guiraud P. 1954 . Les Charactères Statistiques du Vocabulaire. Essai de méthodologie . Presses Universitaires de France . Ishikawa T. 2011 . ‘Examining the influence of intentional reasoning demands on learner perceptions of task difficulty and L2 monologic speech’ in Robinson P. (ed.): Second Language Task Complexity: Researching the Cognition Hypothesis of Language Learning and Performance . John Benjamins Publishing Company . Jackson D. O. , Suethanapornkul S. . 2013 . ‘ The cognition hypothesis: A synthesis and meta-analysis of research on second language task complexity ,’ Language Learning 63 : 330 – 67 . Google Scholar CrossRef Search ADS Kim Y. 2009 . ‘ The effects of task complexity on learner–learner interaction ,’ System 37 : 254 – 68 . Google Scholar CrossRef Search ADS Kim Y. , Payant C. , Pearson P. . 2015 . ‘ The intersection of task-based interaction, task complexity, and working memory ,’ Studies in Second Language Acquisition 37 : 549 – 81 . Google Scholar CrossRef Search ADS Long M. H. 1985 . ‘A role for instruction in second language acquisition: task-based language teaching’ in Hyltenstam K. , Pienemann M. (eds): Modelling and Assessing Second Language Acquisition . Multilingual Matters Ltd . Long M. H. 1996 . ‘The role of the linguistic environment in second language acquisition’ in Ritchie W. R. , Bhatia T. J. (eds): Handbook of Second Language Acquisition . Academic Press . Long M. H. 2015 . ‘Task-based syllabus design’ in Long M. (ed.): Second Language Acquisition and Task-Based Language Teaching . Wiley . Long M. H. , Crookes G. . 1992 . ‘ Three approaches to task-based syllabus design ,’ TESOL Quarterly . 26 : 27 – 56 . Google Scholar CrossRef Search ADS Malicka A. , Levkina M. . 2012 . ‘ Measuring task complexity: Does L2 proficiency matter ,’ Task-Based Language Teaching in Foreign Language Contexts: Research and Implementation . 43 – 66 . McNamara D. S. , Louwerse M. M. , Cai Z. , Graesser A. . 2013 . ‘Coh-Metrix version 3.0,’ available at http://cohmetrix.com. Accessed August 2017 Michel M. C. 2011 . ‘ Effects of task complexity and interaction on L2 performance ,’ Second Language Task Complexity: Researching the Cognition Hypothesis of Language Learning and Performance 2 : 141 – 73 . Google Scholar CrossRef Search ADS Michel M. C. , Kuiken F. , Vedder I. . 2007 . ‘ The influence of complexity in monologic versus dialogic tasks in Dutch L2 ,’ International Review of Applied Linguistics in Language Teaching 45 : 241 – 59 . Google Scholar CrossRef Search ADS Norris J. M. 2010 . ‘Understanding instructed SLA: Constructs, contexts, and consequences,’ in Plenary address delivered at the annual conference of the European Second Language Association (EUROSLA), Reggio Emilia. Norris J. M. , Ortega L. . 2009 . ‘ Towards an organic approach to investigating CAF in instructed SLA: The case of complexity ,’ Applied Linguistics 30 : 555 – 78 . Google Scholar CrossRef Search ADS Révész A. 2014 . ‘ Towards a fuller assessment of cognitive models of task-based learning: Investigating task-generated cognitive demands and processes ,’ Applied Linguistics 35 : 87 – 92 . Google Scholar CrossRef Search ADS Révész A. , Sachs R. , Hama M. . 2014 . ‘ The effects of task complexity and input frequency on the acquisition of the past counterfactual construction through recasts ,’ Language Learning 64 : 615 – 50 . Google Scholar CrossRef Search ADS Révész A. , Michel M. , Gilabert R. . 2015 . ‘ Measuring cognitive task demands using dual task methodology, subjective self-ratings, and expert judgments: a validation study ,’ Studies in Second Language Acquisition 28 : 1 – 35 . Révész A. , Kourtali N. E. , Mazgutova D. 2017 . ‘ Effects of task complexity on L2 writing behaviors and linguistic complexity, ,’ Language Learning 67 : 208 – 41 . Google Scholar CrossRef Search ADS Robinson P. 1995 . ‘ Task complexity and second language narrative discourse ,’ Language Learning 45 : 99 – 140 . Google Scholar CrossRef Search ADS Robinson P. 2001a . ‘Task complexity, cognitive resources, and syllabus design: A triadic framework for examining task influences on SLA’ in Robinson P. (ed.): Cognition and Second Language Instruction . Cambridge University Press . Google Scholar CrossRef Search ADS Robinson P. 2001b . ‘ Task complexity, task difficulty, and task production: Exploring interactions in a componential framework ,’ Applied Linguistics . 22 : 27 – 57 . Google Scholar CrossRef Search ADS Robinson P. 2003 . ‘ The cognitive hypothesis, task design, and adult task-based language learning ,’ Second Language Studies . 21 : 45 – 105 . Robinson P. 2005 . ‘ Cognitive complexity and task sequencing: studies in a componential framework for second language task design ,’ International Review of Applied Linguistics in Language Teaching . 43 : 1 – 32 . Google Scholar CrossRef Search ADS Robinson P. 2007 . ‘Criteria for classifying and sequencing pedagogic tasks’ in Mayo M. G. (ed.): Investigating Tasks in Formal Language Learning . Multilingual Matters Ltd . Robinson P. 2011 . ‘Second language task complexity, the cognition hypothesis, language learning, and performance’ in Robinson P. (ed.): Second Language Task Complexity: Researching the Cognition Hypothesis of Language Learning and Performance . John Benjamins Publishing Company . Google Scholar CrossRef Search ADS Rostamian M. , Fazilatfar A. M. , Jabbari A. . 2017 . ‘ The effect of planning time on cognitive processes, monitoring behavior, and quality of L2 writing ,’ Language Teaching Research : 1 – 21 .. Sasayama S. 2016 . ‘ Is a ‘complex’ task really complex? Validating the assumption of cognitive task complexity ,’ The Modern Language Journal 100 : 231 – 54 . Google Scholar CrossRef Search ADS Skehan P. 1996 . ‘ A framework for the implementation of task-based instruction ,’ Applied Linguistics 17 : 38 – 62 . Google Scholar CrossRef Search ADS Skehan P. 1998 . A Cognitive Approach to Language Learning . Oxford University Press . Skehan P. 2014 . Processing Perspectives on Task Performance . John Benjamins Publishing Company . Google Scholar CrossRef Search ADS Skehan P. , Foster P. . 1997 . ‘ Task type and task processing conditions as influences on foreign language performance ,’ Language Teaching Research 1 : 185 – 211 . Google Scholar CrossRef Search ADS Zakay D. 1992 . ‘On prospective time estimation, temporal relevance and temporal uncertainty’ in Macar F. , Pouthas F. , Friedman W. J. (eds): Time, Action and Cognition: Towards Bridging the Gap . Springer Science & Business Media . © Oxford University Press 2018 This article is published and distributed under the terms of the Oxford University Press, Standard Journals Publication Model (https://academic.oup.com/journals/pages/about_us/legal/notices)

Journal

Applied LinguisticsOxford University Press

Published: Jan 2, 2018

There are no references for this article.

You’re reading a free preview. Subscribe to read the entire article.


DeepDyve is your
personal research library

It’s your single place to instantly
discover and read the research
that matters to you.

Enjoy affordable access to
over 18 million articles from more than
15,000 peer-reviewed journals.

All for just $49/month

Explore the DeepDyve Library

Search

Query the DeepDyve database, plus search all of PubMed and Google Scholar seamlessly

Organize

Save any article or search result from DeepDyve, PubMed, and Google Scholar... all in one place.

Access

Get unlimited, online access to over 18 million full-text articles from more than 15,000 scientific journals.

Your journals are on DeepDyve

Read from thousands of the leading scholarly journals from SpringerNature, Elsevier, Wiley-Blackwell, Oxford University Press and more.

All the latest content is available, no embargo periods.

See the journals in your area

DeepDyve

Freelancer

DeepDyve

Pro

Price

FREE

$49/month
$360/year

Save searches from
Google Scholar,
PubMed

Create lists to
organize your research

Export lists, citations

Read DeepDyve articles

Abstract access only

Unlimited access to over
18 million full-text articles

Print

20 pages / month

PDF Discount

20% off