TY - JOUR AU - Moran,, Meghan AB - Abstract With the ascendency of English as a global lingua franca, a clearer understanding of what constitutes intelligible speech is needed. However, research systematically investigating the threshold of intelligibility has been very limited. In this article, we provide a brief summary of the literature as it pertains to intelligible and comprehensible speech, and then report on an exploratory study seeking to determine what specific features of accented speech make it difficult for global listeners to process. Eighteen speakers representing six English varieties were recruited to provide speech stimuli for two English listening tests. Sixty listeners from the same six English varieties took part in the listening tests, and their scores were then assessed against measurable segmental, prosodic, and fluency features found in the speech samples. Results indicate that it is possible to identify particular features of English speech varieties that are most likely to lead to a breakdown in communication, and that the number of such features present in a particular speakers’ speech can predict intelligibility. As English has become increasingly dominant as the primary language of global communication, the number of users who speak those varieties historically considered to be standard (British, American, etc.) is now in the minority (Li 2009). In fact, English is being appropriated in various ways by the communities and nations who are increasingly using it. The World Englishes paradigm,1 as discussed by Kachru (1992) and Nelson (2011), has been influential in establishing that many widely established English varieties spoken around the globe are legitimate means of communication. Furthermore, in some contexts, so-called non-standard varieties may actually be preferable to standard varieties. Examples of this would include English varieties spoken in India and Nigeria as official languages (or even first languages) of a subset of the population, or English varieties used as a lingua franca across state boundaries, such as when Koreans and Mexicans are engaged in commercial trade with each other. Given the wide range of contexts in which English is spoken, one can argue that imposing traditionally standard varieties (e.g. British and American English) in English as an International Language (EIL) contexts is inappropriate, since other varieties are already established as effective and more appropriate modes of communication (Hamp-Lyons and Davies 2008). Given this reality, the notion that one particular variety should be the standard-bearer for English is increasingly unjustified. In response, some applied linguists have begun promoting EIL as the goal to which most speakers ought to aspire (Jenkins 2006). EIL is not so much a standard, as it is a negotiation of form and meaning in interaction. An EIL approach recognizes that the English language belongs to all who use it, rather than to speakers of traditionally privileged varieties alone. In the realm of English pronunciation, advocates of EIL emphasize mutual intelligibility over mastery of a particular native accent (Yano 2001; Jenkins 2006). However, what constitutes mutual intelligibility varies from context to context. In addition, even the features that comprise the intelligibility of specific accented varieties (on the part of the speaker alone) have not yet been fully established. Despite a growing recognition that the pronunciation of many international English varieties is intelligible to a wide range of listeners, many EIL speakers continue to believe that they should aim for pronunciation associated with historically prestigious varieties (e.g. the USA and British). In part, this belief may be reinforced by the absence or near absence of so-called non-standard varieties of English in teaching materials (e.g. coursebooks) as well as on tests of English proficiency, which privilege British or American English varieties. While an emphasis on these varieties may have had merit several generations ago, the current reality of EIL, even within traditionally majority-English-speaking countries, compromises the ecological validity of using only these varieties in both pedagogy and assessment. Therefore, many scholars in recent years have suggested the inclusion of accented varieties of English in high-stakes English assessments (Taylor 2006; Abeywickrama 2013; Ockey and French 2016; Ockey et al. 2016). However, there are legitimate concerns with incorporating a variety of English accents into the assessment of L2 listening. First, including accents with which listeners are unfamiliar may actually be seen as lacking in ecological validity, unless the test-taker aspires to understand such accents because of some utilitarian purpose. Nelson (2011) argues that intelligibility of language is contextually determined, and that there must be some communicative usefulness associated with understanding a particular variety. Otherwise, including a range of English varieties on a test of English proficiency may be nothing more than an ideologically motivated enterprise. For example, a Brazilian learner of English who wishes to remain in Brazil and is taking an English exam only for employment purposes may have no need to understand Japanese-accented English. However, tests such as the Test of English as a Foreign Language (TOEFL) and International English Language Testing System, which are marketed as international rather than local, could strengthen their validity by incorporating different varieties of English into their listening sections. Assuming that the use of English varieties does, in fact, fit with the test purpose, a second concern is encountered: incorporating a variety of English accents may disadvantage certain test-takers who have not had substantial experience with those accents (Taylor and Geranpayeh 2011). Ockey and French (2016: 19) go so far as to say that unless carefully validated, the use of a range of unfamiliar accents for listening comprehension assessment is ‘professionally irresponsible’. Even if items produced by particular accents are validated (i.e. found to be free of bias), it is a challenge to determine which accents or English varieties should be represented; incorporating every type of accent available in the target language use domain is impossible from a practical standpoint (Elder and Harding 2008). Rather, the types of accents used should reflect what real-world listeners might encounter in the context of EIL. Further, even within each variety of English, individual speakers may be more or less intelligible to EIL listeners. That is, we cannot assume that all speakers of a particular English variety speak in a way that is sufficiently intelligible to all proficient listeners. Some listeners may process speech produced by less intelligible speakers with less effort due to previous exposure to that particular variety and its range of speakers (Gass and Varonis 1984). What is needed are input texts that are equally intelligible to all listeners, regardless of their previous exposure to particular varieties. Using mock TOEFL listening comprehension tests as a test case, Ockey and French (2016) demonstrated that when listening passages were spoken by some speakers with varied but highly intelligible English accents (with a strength of accent below 2.0 on their scale), their inclusion did not significantly affect comprehension scores. However, for speakers with mild accents (i.e. 2.0–2.6 on the strength of accent scale), listening comprehension scores were negatively affected. It is important to keep in mind that Ockey and French’s (2016) study focused on English varieties from English-majority countries, which can be assumed to be more familiar to a greater number of listeners, and therefore easier to understand. The process by which stimulus talkers from a wider range of English varieties are selected for inclusion is far more complex. It requires an empirically based approach to determining an intelligibility threshold for any English variety used in a test of listening comprehension, whether those varieties are familiar to listeners or not. 1. INTELLIGIBILITY, COMPREHENSIBILITY, AND ACCENTEDNESS Following Munro and Derwing (1995) and Derwing and Munro (1997), three important constructs have come to dominate the pronunciation literature: intelligibility, comprehensibility, and accentedness. According to Derwing and Munro, intelligibility refers to the extent to which the speaker’s intended utterance is actually understood by a listener (often measured via transcription), whereas comprehensibility pertains to the degree of difficulty the listener experiences in attempting to understand an utterance. While Munro and Derwing (1995) found the first two constructs to be quite highly intercorrelated, accentedness (meaning the extent to which an L2 learner’s speech is perceived to differ from a particular standard) was found to be only moderately correlated with comprehensibility and weakly correlated with intelligibility. That is, a given speaker could have a reportedly strong foreign accent, yet still be highly intelligible. Thus, the presence of a foreign accent does not necessarily imply reduced intelligibility (Harding 2011). A consequence of these facts is that establishing what constitutes accented but highly intelligible English speech can provide a means for incorporating a variety of English accents into high-stakes tests of English listening proficiency. Establishing intelligibility is a complex endeavor, since it is partially influenced by factors beyond a given speaker’s control. Specifically, intelligibility (and by extension comprehensibility) has an interactional dimension (Smith and Nelson 1985), with listeners playing a role (Fayer and Krasinski 1987). For example, intelligibility and comprehensibility can be affected by listeners’ attitudes, expectations, and stereotypes that they may associate with particular accents (Rubin 1992; Kang and Rubin 2009; Lippi-Green 2012). 2. Establishing an empirically motivated threshold of intelligibility Previously, the threshold of intelligibility has been defined as the lowest requirement for efficiently conveying a message from a native listener’s standpoint (Gimson 1980). Extending this to the EIL context, the threshold of intelligibility can be viewed as the point at which speech is considered just good enough for successful communication between particular speakers and their interlocutors. Unlike comprehensibility and accentedness, which are impressionistic judgments on the part of listeners, intelligibility is the extent to which a listener can correctly transcribe words that they hear. In this study, in particular, intelligibility is operationalized and defined even more narrowly in phonological and perceptual terms. To date, definitions of what constitutes intelligibility are rather opaque. What counts as successful communication depends on many factors, including context, the interlocutors involved in the communication, prior experience of listeners with particular accents, etc. Thus, what comprises a threshold of intelligibility must be operationalized in context specific ways. For example, a high-stakes English test such as the TOEFL requires speakers who deliver the listening section of the test to hold a degree of general intelligibility which is sufficiently high to reduce any bias effects. The current study seeks to establish a threshold of intelligibility across a variety of English accents in the context of a TOEFL-type monologic listening comprehension test. Given the nascent nature of this field of inquiry, our study is quite exploratory in nature. In it, we specifically aim to identify which phonological features of six English varieties collectively contribute to establishing a threshold of acceptable intelligibility for high-stakes test-takers, regardless of what accent is used in such tests. 3. Features that affect intelligibility and listening comprehension While what constitutes intelligibility remains under-defined, existing literature provides a useful starting point for considering which features contribute to speech that is just good enough. In the realm of segmentals (i.e. vowels and consonants), Catford (1987) argues that all ‘phonemic contrasts carry a certain functional load, based upon the number of pairs of words in the lexicon that serves to keep [the contrast] distinct’ (p. 88). This research led to a list of all English phonemes organized to reflect the frequency with which each phoneme contrasts with other phonemes. This, in turn, allowed for each phoneme to be ranked from high to low on a functional load continuum. Brown (1991) proposes a more fine-tuned functional load hierarchy, which takes into account other factors, such as whether two words that contrast by a single phoneme are from the same part of speech and whether members of a contrast are similar sounding. Munro and Derwing (2006) sought to empirically validate the importance of functional load by examining the relative contributions of high versus low functional load phonemes to the comprehensibility of foreign accented speech. They found that speakers’ substitution of certain high functional load consonants with contrasting phonemes (e.g. /∫/→/s/,/n/→/l/,/s/→/∫/,/d/→/z/) caused listeners to rate speakers’ comprehensibility much lower than in cases where substitutions comprised low functional load consonants (e.g. /ð/and/d/opposition;/θ/→/f/). Likewise, Jenkins (2003) based her description of what constitutes a Lingua Franca Core for EIL on empirical research, also having found that some phonemes are relatively less important to successful communication than are others. For example, despite their relatively high frequency of occurrence, when substituted with other contrasting phonemes, voiced and voiceless interdental fricatives rarely cause confusion on the part of the listener. This is consistent with Catford’s (1987) position that functional load is not defined simply by frequency of occurrence for particular phonemes, but rather, the frequency with which they contrast with other phonemes to form distinct words (i.e. minimal pairs). In addition to literature on the contribution of segmentals to listeners’ ability to understand speech produced in accents other than their own, an increasing number of studies have addressed the importance of non-native English speakers’ (NNES) prosody (suprasegmentals) in listeners’ judgments of comprehensibility and oral proficiency (Kang et al. 2010). For example, at the level of phrases and sentences, incorrect nuclear stress (i.e. emphasizing the ‘wrong’ words) can affect listeners’ comprehension of content (Field 2005). Similarly, the use of stress to emphasize every word, regardless of its function or semantic importance, causes difficulty for listeners (Wennerstrom 2000; Kang 2010). Poor intonational structure (e.g. narrow pitch range in Kang 2010) and a disturbance in prosodic composition can also considerably affect native speakers listeners’ perceptions (Pickering 2001). For example, the intonational characteristics of English produced by many East Asian speakers can cause US listeners to lose concentration or to misunderstand the speaker’s intent (Kang 2012). In particular, how a speaker applies rising, falling, or level pitch on the focused word of a tone unit can affect both perceived information structure and social cues in L2 discourse. Finally, numerous studies have investigated the relationship between fluency and the comprehensibility of speech. Tavakoli and Skehan (2005) suggest that fluency can be characterized along three separate dimensions (i) speed and density per time unit (speaking rate), (ii) breakdown fluency (number and length of pauses), and (iii) repair fluency. Furthermore, research indicates that speech rate (Kormos and Denes 2004), breakdown fluency (Derwing et al. 2004; Kang et al. 2010), and repair fluency (Iwashita et al. 2008) are all associated with listeners’ comprehension of speech as well as with assessments of speakers’ oral proficiency. Thomson (2015) reports a strong correlation between oral fluency ratings and comprehensibility ratings, although he highlights the fact that in rare cases, a speaker can be fluent, yet not very easy to understand. While the literature described above reveals important variables that contribute to the intelligibility of speech, we know little about how these features interact with specific English accents and specific features of those accents. 4. RESEARCH QUESTIONS The current study is guided by the following research questions: To what extent do unique phonological features affect listeners’ comprehension scores and intelligibility scores? How might these phonological features be used to establish a threshold of intelligibility for a variety of English accents? Phonological features were operationalized using both segmental and suprasegmental measures. We aimed to identify a threshold of intelligibility for TOEFL-type listening comprehension passages produced with varied accents by examining which features of speech typically associated with intelligibility contribute to the comprehension of test passages by typical TOEFL test-takers. 5. Method 5.1 Creation of test materials 5.1.1 Speakers Eighteen speakers, three from each of six distinct groups, were recruited. Two groups represent traditionally standard English accents, General American (GA) and British Received Pronunciation (RP); two groups represent traditionally non-standard English accents spoken in contexts where English is an official language, but not the first language of the speakers, South African Afrikaans (SA) and Indian Hindi (IN); and two groups represent English accents spoken by EIL speakers where English is not an official language, but used for international communication, Mexican Spanish (SP) and Chinese Mandarin (CH). One female and two male speakers per country were included. The speakers’ L1s and geographical origins were controlled to ensure a measure of homogeneity of dialect. An iterative pre-screening process was followed to arrive at our final set of test speakers. First, we recorded potential speakers reading TOEFL listening comprehension test passages (but not the task to be used in the study). Then, following Major et al.’s (2002) recommendations, potential speakers were informally assessed by the research team to ensure that they (i) sounded conversational as a lecturer, (ii) exhibited characteristics of pronunciation typical of that L1 speech community, (iii) handled the terminology of TOEFL lecture scripts fluently so as to appear to be experts in the field represented by TOEFL listening passages, and (iv) had the mature voice quality and pitch of a professional or academic speaker. Speakers from traditionally non-standard English varieties, who were highly proficient in English, but who still retained phonological features that were distinct from those found in historically ‘standard’ varieties (e.g. GA and RP), were identified. The speakers who were ultimately selected included those who were highly intelligible/comprehensible, yet still accented (Harding 2011), as well as those with varying degrees of perceived comprehensibility. They were determined by eight raters with graduate degrees in Applied Linguistics and backgrounds in phonology and pronunciation. These raters provided a scalar rating of the speakers’ sample lecture recordings, using a five-point scale with ‘easy to understand’ and ‘difficult to understand’ as end points. Using mean rating scores, we selected three speakers from each traditionally non-standard variety to represent low (4–5 out of 5), mid (3 out of 5), and high (1–2 out of 5) degrees of comprehensibility, which, as noted earlier, is highly correlated with measures of intelligibility. 5.1.2 Speaking tasks Our final set of 18 speakers were asked to record themselves reading two text tasks: (i) a randomly assigned listening stimulus passage (3–5 min) from the TOEFL iBT listening texts of academic lectures and (ii) a series of 90 nonsense sentences. The nonsense sentences were adopted from previous research on L1 intelligibility of individuals with hearing difficulties or disorders (Nye and Gaitenby 1974; Picheny et al. 1985). 5.1.3 TOEFL listening passages Twenty-three TOEFL passages were provided by the Educational Testing Service. The passages corresponded to the 3–5 min monologic lecture portion of the TOEFL. The subsequent questions are intended to ‘assess test takers’ ability to: understand main ideas or important details;…understand the organization of the information presented; understand relationships between the ideas presented; and make inferences or connections among pieces of information’ (TOEFL iBT Test Framework and Test Development 2010). Questions are mostly multiple choice with one correct answer, although some questions have more than one answer and therefore allow for partial credit. On the survey, prior to each recording, there was a sentence of introduction such as, ‘Listen to part of a talk in a United States history class’. Other lecture topics included science, world history, astronomy, literature, language, and the arts. A screening and validation process was independently conducted by the researchers to finalize a set of 18 of the 23 lecture scripts and related questions that had similar ranges in item difficulty and discrimination. Forty-five participants (35 TOEFL preparatory students and 10 graduate students) took the 23 listening tests recorded by a GA-accented speaker to determine the difficulty of the test items and the familiarity of the passages. Classical item analysis was initially computed on participant responses to obtain index values of item difficulty and discrimination. Many-Facet Rasch Measurement (MFRM) was also performed for an additional validation process. The items matched by both of these methods were finally selected for the study. Based on the results of item difficulty analyses and passage familiarity scores, we removed five passages from further use because of their largely deviated passage difficulty levels. The items finally selected for the current study ranged between 0.63 (the most difficult) and 0.89 (the easiest). Note that classical item analysis statistics is sample-dependent; accordingly, the passages selected are relative to the population of the current study. Infit values from MFRM analyses were also considered for this screening process. Using MFRM’s fit statistics, we could eliminate a problematic or misfitting element. For practical purposes, a reasonable range of Infit values between 0.5 and 1.5 has been recommended (Lunz and Stahl 1990). Accordingly, the current study excluded two passages with logit values of −1.05 and 0.36. That is, the former item (−1.05) was particularly easier than others, and the latter item (0.36) was more difficult than others. Another passage had to be removed because it had an item with a high Infit value (1.96), but it was not one of the easiest ones. In general, the Infit has an expected value of 1.0, with a standard error of 0. Researchers (Linacre and Wright 2002) suggest that for mean squares to be useful for practical purposes, a reasonable range of Infit values should be between 0.5 and 1.5. Values close to or above 2.0 are considered distorting (Myford and Wolfe 2004). Then, we deleted two more passages which contained the next most difficult item (logit = 0.3) and the next easiest item (logit = −0.8), to retain the average difficulty logits similar across the passages. There were a few relatively difficult items, but not too different from the rest of the items. Overall, most of them were clustered around the mean (0). As for the test of unidimensionality, the minimum and maximum of the mean square values (MNSQ) showed a little wider than the acceptable range (0.58–1.96) suggested by McNamara (1996) (MNSQ: 0.75–1.3; ZSTD: −2 to +2). However, item reliability was relatively high (0.88). Moreover, a brief vocabulary analysis for the entire 18 passages further added that all passages fell in the ranges of 0.41–0.49 type token ratio, 75–81 per cent for most common 1,000 words, and 3.7–4.4 per cent for the use of academic word list. Also, the selected passages had a familiarity score that ranged from 5.17 to 5.92 out of 7 (1 = very familiar and 7 = not familiar at all). Passages and items were then evenly and systematically distributed across speakers of the six L1s, according to topics, item difficulty for each testlet, and other task features. The final set of official TOEFL-type materials in the first task was controlled for style (a professor’s monologic speech), length (passages of 500–800 words), type and number of questions (6), and content (topics deemed appropriate for university level students). All the listening passages assigned to each speaker were similar in style but different in topic; that is, speakers recorded different listening passages. To avoid a speaker’s speech rate being a confounding variable to intelligibility (Derwing and Munro 2001), we also ensured that final speech files fell within the range of 2.2–2.8 words (approximately 3.2–3.6 syllables) per second by having speakers re-record them if they did not hit this target on the first pass (which most of them did). In Derwing and Munro’s (2001) study, speech rate was found to be an integral aspect to intelligibility; in the current study, we intentionally controlled for speech rate to neutralize a ‘nuisance’ variable. While speech rate is difficult to control in authentic conversational situations, it can easily be controlled for in a listening passage lecture. The final 18 speakers’ recordings were also evaluated by 48 novice listeners who provided scalar judgments of both strength of accent and comprehensibility. These raters included a mixture of undergraduate students (19), teachers (8), and graduate students (21). For each speaker, listeners were asked to complete two 7-point Likert scale items to reflect accent and comprehensibility: 1 = ‘no accent’/7 = ‘heavy accent’, and 1 = ‘easy to understand’/7 = ‘difficult to understand’. Accent ratings for three speakers in each country were evaluated as follows: 1.21, 1.27, and 1.33 for American speakers; 2.33, 2.56, and 2.62 for British speakers; 3.44, 4.29, and 5.11 for Indian speakers; 3.68, 5.27, and 5.53 for South African speakers; 3.60, 5.60, and 5.84 for Mexican speakers; and 2.04, 4.13, and 5.67 for Chinese speakers. Although the scales for the preliminary rating and the 48 novice listeners were different, the comprehensibility ratings assigned by the novice raters roughly matched those given by the expert raters. Comprehensibility ratings showed a very similar pattern in that while GA and RP speakers did not demonstrate variance in their comprehensibility, the speakers of other English accents did. High-comprehensibility speakers ranged from 1 to 2, mid-comprehensibility speakers ranged from 3 to low-4, and low-comprehensibility speakers ranged from mid-4 to 5. As a result, GA and RP speaker recordings were treated as equivalent for the relative degree of comprehensibility, while recordings from the speakers of the other English accents could be labeled low, mid, and high in terms of their relative comprehensibility. The mean differences of both the comprehensibility and accentedness scores among these three levels were statistically significant (p <. 05). 5.1.4 Nonsense sentences A measure of intelligibility using nonsense sentences is one of the most common methods used in the field of speech language pathology to investigate the clarity of speech, but it has been underutilized in the field of applied linguistics. It was one of five measurement techniques used as part of the larger project from which this study stems (Kang et al. 2018a), which explored how best to operationalize intelligibility. We selected this particular approach to examine the intelligibility of different varieties of Englishes because it was most significantly associated with listeners’ comprehension scores in that larger study (Kang et al. 2018a). Specific nonsense sentence items were chosen from the test banks used in Nye and Gaitenby (1974) and Picheny et al. (1985). The sentences were semantically meaningless while being syntactically normal, and they were comprised of high-frequency monosyllabic English words. Sentences taken from Nye and Gaitenby (1974) all followed the pattern ‘The (adjective) (noun) (verb, past tense) the (noun).’; those from Picheny et al. were slightly more grammatically complex but had four distinct content words (e.g. The tall kiss can draw with an oak.). Therefore, listeners were asked to fill out four missing content words per sentence. The list included 72 sentences (4 sentences × 18 speakers). A subset of four of each was chosen and randomized from each speaker to be used in the evaluations. Accordingly, the possible scale of this intelligibility measure for each speaker is from 0 to 16 (4 missing words × 4 sentences). The sentences were presented auditorily on the survey with four blanks beneath each recording. Participants were asked to type in each of the four words they heard in each blank, respectively. Manual coding allowed for incorrect spellings to count as correct, as long as they were interpretable. Likewise, homophones were accepted because of the lack of semantic context of the sentences. All recorded files were converted to a .wav file using the program Freemake Audio Converter and screened for sound quality and volume. They were also edited visually and aurally from a waveform display using PRAAT. Any vocal dysfluencies, such as throat clearings or restarts, were deleted. The edited sound files were then embedded into several surveys, after randomization, using the assessment tool SurveyGizmo. 5.2 Listeners Sixty listeners from the same six countries represented in our listening materials (the TOEFL monologic lectures) were recruited to take the TOEFL listening and intelligibility tests we had created: 10 American (three males and seven females), 10 British (four males and six females), 10 Indian (five males and five females), 10 non-Anglophone South African (six males and four females), 10 Chinese (three males and seven females), and 10 Spanish (five males and five females). Listeners comprised senior undergraduate university students or early stage graduate students. All NNES listeners were highly proficient in English; that is, they had received TOEFL iBT scores of 100 or higher. Their ages ranged from 18 to 32. All listeners were asked to complete a short demographic survey and a short diagnostic test. The diagnostic test consisted of a one-passage listening test with six questions derived from a currently available TOEFL iBT practice test and produced by a standard GA speaker. Listeners who performed successfully on the practice section with one or no mistakes/incorrect answers were invited to complete the study. 5.3 Data collection procedures The focus of this project involves one listening comprehension test and one intelligibility measure: transcription of four missing content words in semantically nonsensical sentences. Listeners were required to take the tests on two different days: the listening comprehension test on Day 1 (18 passages in total for approximately 2 h) and the nonsense sentence intelligibility test on Day 2 (30 min). They were instructed to complete each day’s test in one sitting. The listeners were allowed to take notes for the comprehension test, but not for the intelligibility measure. Each listening session was highly controlled and supervised; listeners completed each test session in an approved computer lab, using headsets to ensure sound fidelity. Five links were created that were identical except for the order of listening passages and nonsense sentences. The links were distributed randomly to the participants such that approximately 20 per cent of the listeners received Link 1, another 20 per cent received Link 2, etc. This was to control for order effect as well as the effect of listener fatigue. All listeners were compensated for their participation. 5.4 Phonetic and phonological analysis Recorded speech was subjected to phonetic and phonological analyses for segmental, prosodic, and fluency features. The features of interest are known to be highly correlated with native English speakers' (NES) and NNESs’ communicative success (Anderson-Hsieh and Venkatagiri 1994; Pickering 2001; Kormos and Denes 2004). Since pronunciation of other varieties of English is not considered ‘errors’ in this study, we use the term ‘divergence’ to refer to any difference relative to the American English standard typically associated with the TOEFL. The GA English that was used as a basis for comparison was the English referenced in Avery and Ehrlich (2008). A trained phonetician identified all divergences in the TOEFL lecture recordings. Inter-coder reliability of .93 or above through intraclass correlation coefficents was achieved between the trained phonetician and two other coders who analyzed a subset of 10 per cent of the speech data. All pronunciation categories examined are provided in Table 1. Although the final two features (i.e. absence of postvocalic r and absence of flap) should not be regarded as production ‘errors’, and are rather characteristic of certain varieties of English, they sometimes serve to cause a breakdown of intelligibility and were thus noted. If a word that contained a divergence from American English was repeated (with that same divergence), each occurrence was tallied separately. We further analyzed certain segmental divergences (i.e. consonant and vowel substitutions) in terms of the functional load principle, classifying them as high functional load versus low functional load substitutions, following Catford (1987) and Brown (1991). Rankings from 51 to 100 per cent were considered as high functional load substitutions (e.g. ‘pit’ versus ‘bit’) and those from 0 to 50 per cent as low functional load substitutions (e.g. ‘they’ versus ‘dey’). All of the segmental divergences were calculated as the total number of segmental divergences divided by the total number of syllables articulated. This form of assessment was completed for both the listening passages and the sentences. Table 1: Phonological and phonetic analyses of 18 speakers’ listening comprehension passages Segmental features Prosodic features Fluency features Consonant deletion Space Articulation rate Vowel deletion Pace MLR Syllable reduction Number of prominent syllables Phonation time ratio Consonant cluster simplification Word stress divergence Number of silent pauses High functional load vowel divergence Level tone choices Mean length of pauses Low functional load vowel divergence Falling tone choices Unexpected pause ratio High functional load consonant divergence Rising tone choices Expected pause ratio Low functional load consonant divergence Number of tone units Linking divergence Rhythm Consonant insertion Vowel insertion Dark /l/ Absence of postvocalic /r/ Absence of flap Segmental features Prosodic features Fluency features Consonant deletion Space Articulation rate Vowel deletion Pace MLR Syllable reduction Number of prominent syllables Phonation time ratio Consonant cluster simplification Word stress divergence Number of silent pauses High functional load vowel divergence Level tone choices Mean length of pauses Low functional load vowel divergence Falling tone choices Unexpected pause ratio High functional load consonant divergence Rising tone choices Expected pause ratio Low functional load consonant divergence Number of tone units Linking divergence Rhythm Consonant insertion Vowel insertion Dark /l/ Absence of postvocalic /r/ Absence of flap Table 1: Phonological and phonetic analyses of 18 speakers’ listening comprehension passages Segmental features Prosodic features Fluency features Consonant deletion Space Articulation rate Vowel deletion Pace MLR Syllable reduction Number of prominent syllables Phonation time ratio Consonant cluster simplification Word stress divergence Number of silent pauses High functional load vowel divergence Level tone choices Mean length of pauses Low functional load vowel divergence Falling tone choices Unexpected pause ratio High functional load consonant divergence Rising tone choices Expected pause ratio Low functional load consonant divergence Number of tone units Linking divergence Rhythm Consonant insertion Vowel insertion Dark /l/ Absence of postvocalic /r/ Absence of flap Segmental features Prosodic features Fluency features Consonant deletion Space Articulation rate Vowel deletion Pace MLR Syllable reduction Number of prominent syllables Phonation time ratio Consonant cluster simplification Word stress divergence Number of silent pauses High functional load vowel divergence Level tone choices Mean length of pauses Low functional load vowel divergence Falling tone choices Unexpected pause ratio High functional load consonant divergence Rising tone choices Expected pause ratio Low functional load consonant divergence Number of tone units Linking divergence Rhythm Consonant insertion Vowel insertion Dark /l/ Absence of postvocalic /r/ Absence of flap Additional information regarding the phonetic and phonological analyses can be found in Online Supplementary Materials and also in Kang et al. (2018a). 5.5 Data analysis During our initial analyses of the data (i.e. linear mixed-effects models, LMEMs, and analyses of variance (ANOVAs)), we found that the American and British listeners obtained significantly different TOEFL lecture comprehension scores than the other listeners (p < .000) (see Kang et al. (2018b) for more details). Accordingly, the American and British groups were excluded from the primary analysis of this study, as they do not represent target TOEFL test-takers. The subsequent analyses were conducted using a listener group comprising four countries, that is, South African, Indian, Spanish, and Chinese listeners. Several analyses were conducted to answer the first research question regarding the effect of the phonological features of speech on listeners’ comprehension scores. To facilitate interpretation of the relationship between each of the phonological features and the listeners’ comprehension tests, we first categorized the pronunciation features into three groups: (i) segmentals, (ii) prosody, and (iii) fluency. The segmental features include vowels and consonants, while prosodic features involve stress, intonation, and rhythm features. Fluency features refer to articulation rate, mean lenth of run, pauses, and hesitation markers. Initially, a principal component analysis (PCA) was conducted as part of the dimension reduction process so that the number of normed phonological variables could be reduced for the final analysis as seen in Kang et al. (2018a). Pearson correlation coefficients were computed to ensure that the final set of selected phonological features were not highly correlated among each other. Linear LMEMs were then performed to examine how the overall phonological features could affect the listening test scores. LMEM was chosen as a primary analysis due to its robust power and flexibility by including both random and fixed effects (Faraway 2005). For this mixed model, we treated the reduced phonological features as covariates and took speaker and listeners for random effects. Listening comprehension scores and intelligibility scores served as dependent variables. These variables were also checked for normal distribution. The skewness and kurtosis test results ranged from −1.4 to 2.4 with SDs of 0.14–0.74. Given that acceptable values for psychometric purposes are between −2 and +2 (George and Mallery 2010), these variables were considered to be normally distributed. Our preliminary analyses (i.e. LMEM and ANOVAs) revealed that there was no statistically significant difference in listeners’ comprehension scores among the six GA and RP speakers and the four most highly intelligible speakers (one speaker representing each of the South African, Indian, Spanish, and Chinese accents) (p > .54), based on results of pairwise comparisons between the average listening comprehension scores for all of these groups. Accordingly, pronunciation features of these 10 highly intelligible speakers were used to describe a cutoff point of a speaker’s intelligibility threshold. Descriptive statistics exhibited the tentative establishment of the threshold baseline. 6. Results First, we conducted PCA analyses to reduce the number of phonological variables. We provide a summary of the results below, but more detailed information regarding this preliminary step of our analysis can be found in Kang et al. (2018a). The PCA was computed three times independently for each of the three categories (segmental features, prosodic features, and fluency features) to maintain the transparent nature of each category, which enhanced the interpretation of composite variables. In the first computation, the principal component consisted of the following six features: consonant deletions, syllable reductions, consonant cluster divergence, high functional vowel substitutions, low functional vowel substitutions, and high functional consonant substitutions. All six features extracted had positive coefficients; accordingly, we combined all six features and created one super-feature called ‘consonant and vowel divergence’. Pace, word stress divergence, and falling tone choices were discriminant from other variables displaying the same direction with negative component loadings. On the other hand, number of tone units, rising tone choices, and rhythm revealed the opposite characteristics with positive coefficients but strongly correlated with the Primary Component 1. Accordingly, we labeled the first three features (pace, word stress divergence, and falling tone choices) as ‘impeding prosodic markers’ and the other three features (number of tone units, rising tone choices, and rhythm) as ‘enhancing prosodic markers’. Articulation rate, number of silent pauses, and unexpected pause ratio revealed a positive relationship with the Primary Component 1, whereas mean length of run (MLR) and expected pause ratio showed a negative association. As a result, the first four features were grouped and labeled as ‘impeding fluency markers’ as the increment of such variables hindered the listeners from understanding the lectures. The other two features were composited together and labeled as ‘enhancing fluency markers’ because the increase of these features (i.e. MLR and expected pause ratio) would enhance the listeners’ listening comprehension. In sum, for the LMEM analysis, the following five independent variables were created as predictors for the response variables: (i) vowel and consonant divergence, (ii) impeding prosody markers, (iii) enhancing prosody markers (iv) impeding fluency markers, and (v) enhancing fluency markers. Table 2 provides a summary of these five categories with their phonological features, respectively. Table 2: Summary of phonological features for the five clustered variables Variables Phonological features Vowel and consonant divergence consonant deletion, syllable reduction, consonant cluster divergence, high functional vowel substitutions, low functional vowel substitutions, and high functional consonant substitutions Impeding prosody markers pace, word stress divergence, and falling tone choices Enhancing prosody markers number of tone units, rising tone choices, and rhythm Impeding fluency markers articulation rate, number of silent pauses, and unexpected pause ratio Enhancing fluency markers MLR and expected pause ratio Variables Phonological features Vowel and consonant divergence consonant deletion, syllable reduction, consonant cluster divergence, high functional vowel substitutions, low functional vowel substitutions, and high functional consonant substitutions Impeding prosody markers pace, word stress divergence, and falling tone choices Enhancing prosody markers number of tone units, rising tone choices, and rhythm Impeding fluency markers articulation rate, number of silent pauses, and unexpected pause ratio Enhancing fluency markers MLR and expected pause ratio Table 2: Summary of phonological features for the five clustered variables Variables Phonological features Vowel and consonant divergence consonant deletion, syllable reduction, consonant cluster divergence, high functional vowel substitutions, low functional vowel substitutions, and high functional consonant substitutions Impeding prosody markers pace, word stress divergence, and falling tone choices Enhancing prosody markers number of tone units, rising tone choices, and rhythm Impeding fluency markers articulation rate, number of silent pauses, and unexpected pause ratio Enhancing fluency markers MLR and expected pause ratio Variables Phonological features Vowel and consonant divergence consonant deletion, syllable reduction, consonant cluster divergence, high functional vowel substitutions, low functional vowel substitutions, and high functional consonant substitutions Impeding prosody markers pace, word stress divergence, and falling tone choices Enhancing prosody markers number of tone units, rising tone choices, and rhythm Impeding fluency markers articulation rate, number of silent pauses, and unexpected pause ratio Enhancing fluency markers MLR and expected pause ratio 6.1 Effect of phonological features on listener comprehension and intelligibility scores The Pearson correlations of the five clustered variables confirmed that all variables were relatively independent of each other (.325 or lower). Consequently, we computed the linear mixed-effects models using each listener and speaker as random effects and all five phonological variables as covariates for both listening comprehension and intelligibility as dependent variables. Table 3 below reports estimates of main effects for each of the phonological parameters for each of the two models. The correlation estimates are the slope for the effect of the outcome variables, and t-values are the estimates divided by the standard errors. The results showed that vowel/consonant divergence, enhancing prosody, and impeding fluency significantly impacted the listeners’ comprehension score (p < .05). Using Nakagawa and Schielzeth’s (2013) suggested formula, conditional R2 = 0.31 was calculated with the fixed and random effect variance included. That is, approximately 31 per cent of variance in listening comprehension scores was explained by phonological variables selected in the model when both random (listeners and speakers) and fixed factors were considered. Table 3: Estimates of main effects for the five selected phonological features on the listening comprehension test and intelligibility measure Listening comprehension test Intelligibility measure Parameter df Corr. Est S.E t Sig. Corr. est. S.E t Sig. Intercept 714 5.505 0.607 9.065 .000* 11.073 1.438 7.698 .000* Vowel/consonant divergence 714 −10.2 1.630 −6.255 .000* −49.801 3.862 −12.895 .000* Impeding prosody 714 −0.218 0.007 −1.643 .101 −1.761 0.170 −5.590 .000* Enhancing prosody 714 0.034 0.030 2.447 .015* 0.002 0.071 0.084 .933 Impeding fluency 714 −0.018 0.013 −2.638 .008* 0.028 0.315 1.701 .089 Enhancing fluency 714 0.0321 0.014 1.088 .277 0.444 0.033 6.214 .000* Listening comprehension test Intelligibility measure Parameter df Corr. Est S.E t Sig. Corr. est. S.E t Sig. Intercept 714 5.505 0.607 9.065 .000* 11.073 1.438 7.698 .000* Vowel/consonant divergence 714 −10.2 1.630 −6.255 .000* −49.801 3.862 −12.895 .000* Impeding prosody 714 −0.218 0.007 −1.643 .101 −1.761 0.170 −5.590 .000* Enhancing prosody 714 0.034 0.030 2.447 .015* 0.002 0.071 0.084 .933 Impeding fluency 714 −0.018 0.013 −2.638 .008* 0.028 0.315 1.701 .089 Enhancing fluency 714 0.0321 0.014 1.088 .277 0.444 0.033 6.214 .000* Notes. Corr. est. = correlation estimates; S.E. = standard error. *significance at p < .05. Listening comprehension scale = 0–6; intelligibility scale = 0–16. Table 3: Estimates of main effects for the five selected phonological features on the listening comprehension test and intelligibility measure Listening comprehension test Intelligibility measure Parameter df Corr. Est S.E t Sig. Corr. est. S.E t Sig. Intercept 714 5.505 0.607 9.065 .000* 11.073 1.438 7.698 .000* Vowel/consonant divergence 714 −10.2 1.630 −6.255 .000* −49.801 3.862 −12.895 .000* Impeding prosody 714 −0.218 0.007 −1.643 .101 −1.761 0.170 −5.590 .000* Enhancing prosody 714 0.034 0.030 2.447 .015* 0.002 0.071 0.084 .933 Impeding fluency 714 −0.018 0.013 −2.638 .008* 0.028 0.315 1.701 .089 Enhancing fluency 714 0.0321 0.014 1.088 .277 0.444 0.033 6.214 .000* Listening comprehension test Intelligibility measure Parameter df Corr. Est S.E t Sig. Corr. est. S.E t Sig. Intercept 714 5.505 0.607 9.065 .000* 11.073 1.438 7.698 .000* Vowel/consonant divergence 714 −10.2 1.630 −6.255 .000* −49.801 3.862 −12.895 .000* Impeding prosody 714 −0.218 0.007 −1.643 .101 −1.761 0.170 −5.590 .000* Enhancing prosody 714 0.034 0.030 2.447 .015* 0.002 0.071 0.084 .933 Impeding fluency 714 −0.018 0.013 −2.638 .008* 0.028 0.315 1.701 .089 Enhancing fluency 714 0.0321 0.014 1.088 .277 0.444 0.033 6.214 .000* Notes. Corr. est. = correlation estimates; S.E. = standard error. *significance at p < .05. Listening comprehension scale = 0–6; intelligibility scale = 0–16. Vowel and consonant divergence was a significant predictor of listening comprehension with an effect size value of d = 0.15 which indicated a small effect. It has a negative correlation estimate, which indicates an inverse relationship with the comprehension scores. As the listeners’ performance increased, the speech samples had fewer vowel and consonant divergences particularly regarding consonant cluster divergence, syllable or consonant deletions, or high functional vowel and consonant substitutions. Enhancing prosody markers predicted the listening test score at a significant level with a small effect size value of d = 0.03. These enhancing markers were directly proportional to the comprehension test scores. The listeners’ performance improved significantly when the number of tone units and rising tones increased in the speech. Also, the speaker’s rhythmic ability as part of enhancing prosody was a significant predictor of the listening test. That is, when the speaker made a large contrast between stressed syllables and unstressed syllables, which is how rhythm was measured in this study, the listeners were able to comprehend the lecture and the test better than when the speaker did not attend to this contrast. In addition, impeding fluency markers significantly predicted the response variable (d = 0.04), revealing a small effect size with a negative relation with the listening test scores. In other words, the listeners’ performance decreased when listening to speech that was fast or contained many pauses, particularly in unexpected pause locations. Three out of five phonological features made significant contributions to predicting the intelligibility scores. The vowel/consonant divergence variable was the strongest variable with an effect size value of d = 0.21, followed by enhancing fluency markers with an effect size of d = 0.16. The impeding prosody markers also significantly predicted the measure of intelligibility with an effect size of d = 0.09. The coefficient estimate was negative, indicating an inversely proportional relation with intelligibility scores. The enhancing prosody and impeding fluency markers did not exert any significant effect on the measure of nonsense statements. The conditional R2 for this model was 0.69 with both the fixed and random effect variance included. Note that even though the selected phonological variables reported above showed statistical significance (p < .01), their effect sizes were considered to be small (0.03–0.21) and might not carry meaningful effects. 6.2 Threshold of intelligibility for high-stakes listening comprehension test We then wanted to determine the point at which speakers were equally intelligible, despite differences in first language; in other words, we sought a threshold of intelligibility. It is important to remember, however, that our findings are mostly exploratory and descriptive in nature due to a small sample of speakers (18). A bar graph was created to examine the distribution of the 18 speakers’ intelligibility scores operationalized by the nonsense statement method (see Figure 1). The speakers are displayed on the horizontal axis. The Y-axis represents the intelligibility scores ranging from 0 to 16. Figure 1: View largeDownload slide Distribution of intelligibility scores of nonsense sentences for 18 speakers. Note. For each traditionally non-standard variety, the first speaker in each group has the lowest intelligibility, while the third speaker in each group has the highest intelligibility. Intelligibility scale = 0–16. Figure 1: View largeDownload slide Distribution of intelligibility scores of nonsense sentences for 18 speakers. Note. For each traditionally non-standard variety, the first speaker in each group has the lowest intelligibility, while the third speaker in each group has the highest intelligibility. Intelligibility scale = 0–16. Table 4 below also shows the descriptive statistics for the intelligibility scores assigned for all 18 speakers. We used GA and RP Englishes as the basic comparison for the following two reasons. First, in the range of 0–16 (i.e. 4 missing words per sentence × 4 sentences per speaker = 16), the GA and RP speakers scored 11.35 or higher (i.e. the mean of British Speaker #2 = 11.35) for their intelligibility measures. This means that three out of four missing words should be accurately transcribed in a decontextualized sentence, if a speaker is to be considered highly intelligible. We can cautiously argue that the intelligibility score can be approximately 11.35 or higher in the scale of 0–16 when intelligibility is operationalized as a nonsense statement, if any speaker is considered to be used for the listening test. This means that approximately 71 per cent (11.35/16) of accuracy is required from this particular intelligibility measure. The highly intelligible South African (SA), Indian (IN), Spanish (SP), and Chinese (CH) English speakers securely fell in this range of intelligibility scores: SA3 = 11.7, IN3 = 13.28, SP3 = 12.9, and CH3 = 12.01, respectively. Interestingly, in this nonsense measure, no speaker received an average of a full score of 16, presumably due to the complexity of the task. In other words, it may have been too cognitively challenging for listeners to achieve perfect scores, regardless of speakers’ intelligibility. Table 4: Descriptive statistics for the intelligibility scores of all 18 speakers 95 per cent confidence interval for mean Mean SD Lower bound Upper bound Minimum Maximum GA1 12.516 2.683 11.823 13.210 6.00 16.00 GA2 11.466 3.055 10.677 12.256 5.00 15.00 GA3 13.166 1.786 12.705 13.628 8.00 16.00 RP1 12.616 2.687 11.922 13.311 6.00 16.00 RP2 11.350 3.318 10.498 12.207 3.00 16.00 RP3 13.616 2.307 13.020 14.212 8.00 16.00 SA1 8.750 2.159 8.192 9.308 3.00 13.00 SA2 8.616 2.565 7.954 9.279 3.00 14.00 SA3 11.700 2.513 11.050 12.349 5.00 16.00 IN1 9.883 3.425 8.998 10.768 3.00 16.00 IN2 10.900 2.710 10.199 11.600 4.00 15.00 IN3 13.283 2.917 12.529 14.037 6.00 16.00 SP1 10.033 1.765 9.577 10.489 7.00 13.00 SP2 9.100 1.612 8.683 9.516 6.00 12.00 SP3 12.916 1.441 12.544 13.289 8.00 15.00 CH1 10.116 1.439 9.744 10.488 6.00 13.00 CH2 10.333 1.653 9.906 10.760 5.00 13.00 CH3 12.0167 1.72216 11.5718 12.4615 9.00 15.00 95 per cent confidence interval for mean Mean SD Lower bound Upper bound Minimum Maximum GA1 12.516 2.683 11.823 13.210 6.00 16.00 GA2 11.466 3.055 10.677 12.256 5.00 15.00 GA3 13.166 1.786 12.705 13.628 8.00 16.00 RP1 12.616 2.687 11.922 13.311 6.00 16.00 RP2 11.350 3.318 10.498 12.207 3.00 16.00 RP3 13.616 2.307 13.020 14.212 8.00 16.00 SA1 8.750 2.159 8.192 9.308 3.00 13.00 SA2 8.616 2.565 7.954 9.279 3.00 14.00 SA3 11.700 2.513 11.050 12.349 5.00 16.00 IN1 9.883 3.425 8.998 10.768 3.00 16.00 IN2 10.900 2.710 10.199 11.600 4.00 15.00 IN3 13.283 2.917 12.529 14.037 6.00 16.00 SP1 10.033 1.765 9.577 10.489 7.00 13.00 SP2 9.100 1.612 8.683 9.516 6.00 12.00 SP3 12.916 1.441 12.544 13.289 8.00 15.00 CH1 10.116 1.439 9.744 10.488 6.00 13.00 CH2 10.333 1.653 9.906 10.760 5.00 13.00 CH3 12.0167 1.72216 11.5718 12.4615 9.00 15.00 Table 4: Descriptive statistics for the intelligibility scores of all 18 speakers 95 per cent confidence interval for mean Mean SD Lower bound Upper bound Minimum Maximum GA1 12.516 2.683 11.823 13.210 6.00 16.00 GA2 11.466 3.055 10.677 12.256 5.00 15.00 GA3 13.166 1.786 12.705 13.628 8.00 16.00 RP1 12.616 2.687 11.922 13.311 6.00 16.00 RP2 11.350 3.318 10.498 12.207 3.00 16.00 RP3 13.616 2.307 13.020 14.212 8.00 16.00 SA1 8.750 2.159 8.192 9.308 3.00 13.00 SA2 8.616 2.565 7.954 9.279 3.00 14.00 SA3 11.700 2.513 11.050 12.349 5.00 16.00 IN1 9.883 3.425 8.998 10.768 3.00 16.00 IN2 10.900 2.710 10.199 11.600 4.00 15.00 IN3 13.283 2.917 12.529 14.037 6.00 16.00 SP1 10.033 1.765 9.577 10.489 7.00 13.00 SP2 9.100 1.612 8.683 9.516 6.00 12.00 SP3 12.916 1.441 12.544 13.289 8.00 15.00 CH1 10.116 1.439 9.744 10.488 6.00 13.00 CH2 10.333 1.653 9.906 10.760 5.00 13.00 CH3 12.0167 1.72216 11.5718 12.4615 9.00 15.00 95 per cent confidence interval for mean Mean SD Lower bound Upper bound Minimum Maximum GA1 12.516 2.683 11.823 13.210 6.00 16.00 GA2 11.466 3.055 10.677 12.256 5.00 15.00 GA3 13.166 1.786 12.705 13.628 8.00 16.00 RP1 12.616 2.687 11.922 13.311 6.00 16.00 RP2 11.350 3.318 10.498 12.207 3.00 16.00 RP3 13.616 2.307 13.020 14.212 8.00 16.00 SA1 8.750 2.159 8.192 9.308 3.00 13.00 SA2 8.616 2.565 7.954 9.279 3.00 14.00 SA3 11.700 2.513 11.050 12.349 5.00 16.00 IN1 9.883 3.425 8.998 10.768 3.00 16.00 IN2 10.900 2.710 10.199 11.600 4.00 15.00 IN3 13.283 2.917 12.529 14.037 6.00 16.00 SP1 10.033 1.765 9.577 10.489 7.00 13.00 SP2 9.100 1.612 8.683 9.516 6.00 12.00 SP3 12.916 1.441 12.544 13.289 8.00 15.00 CH1 10.116 1.439 9.744 10.488 6.00 13.00 CH2 10.333 1.653 9.906 10.760 5.00 13.00 CH3 12.0167 1.72216 11.5718 12.4615 9.00 15.00 Table 5 provides descriptive statistics of listeners’ comprehension scores among the six GA and RP speakers and the four most highly intelligible speakers from each of the four countries mentioned above. No statistical difference was found in pairwise comparisons of all of these groups for the average listening comprehension scores (i.e. F = 0.811, p = .542). Accordingly, these 10 speakers were used to explore the threshold of a speaker’s intelligibility in the next section. Note that additional scatter plot distributions of all 18 speakers confirmed that these speakers with high listening comprehension scores (i.e. 5 or above out of 6) were also corresponding to those with high intelligibility scores (i.e. approximately 12.5 or above out of 16). Table 5: Descriptive statistics for the listening comprehension scores of 10 highly intelligible speakers 95 per cent confidence interval for mean Mean SD Lower bound Upper bound Minimum Maximum GA 5.405 0.809 5.287 5.525 2.00 6.00 RP 5.372 0.845 5.253 5.491 2.00 6.00 SA 5.400 0.763 5.194 5.606 3.00 6.00 IN 5.283 0.975 5.077 5.489 2.00 6.00 SP 5.566 0.592 5.361 5.773 4.00 6.00 CH 5.366 0.780 5.161 5.573 3.00 6.00 95 per cent confidence interval for mean Mean SD Lower bound Upper bound Minimum Maximum GA 5.405 0.809 5.287 5.525 2.00 6.00 RP 5.372 0.845 5.253 5.491 2.00 6.00 SA 5.400 0.763 5.194 5.606 3.00 6.00 IN 5.283 0.975 5.077 5.489 2.00 6.00 SP 5.566 0.592 5.361 5.773 4.00 6.00 CH 5.366 0.780 5.161 5.573 3.00 6.00 Table 5: Descriptive statistics for the listening comprehension scores of 10 highly intelligible speakers 95 per cent confidence interval for mean Mean SD Lower bound Upper bound Minimum Maximum GA 5.405 0.809 5.287 5.525 2.00 6.00 RP 5.372 0.845 5.253 5.491 2.00 6.00 SA 5.400 0.763 5.194 5.606 3.00 6.00 IN 5.283 0.975 5.077 5.489 2.00 6.00 SP 5.566 0.592 5.361 5.773 4.00 6.00 CH 5.366 0.780 5.161 5.573 3.00 6.00 95 per cent confidence interval for mean Mean SD Lower bound Upper bound Minimum Maximum GA 5.405 0.809 5.287 5.525 2.00 6.00 RP 5.372 0.845 5.253 5.491 2.00 6.00 SA 5.400 0.763 5.194 5.606 3.00 6.00 IN 5.283 0.975 5.077 5.489 2.00 6.00 SP 5.566 0.592 5.361 5.773 4.00 6.00 CH 5.366 0.780 5.161 5.573 3.00 6.00 Phonological characteristics of those six GA and RP speakers and four most intelligible South African, Indian, Spanish, and Chinese speakers are described in Table 6. The characteristics listed include prominence and tone selection, as these have been found to have direct and consequential impacts on listener perception and comprehension, as well as segmental features that deviated from Standard American English yet did not significantly diminish intelligibility. The information below suggests that highly intelligible speakers whose speech may manifest some divergence from GA English norms can be used for recording lectures in assessment contexts. In other words, some (particular) divergences from GA English do not decrease the general intelligbility of highly intelligible speakers, indicating that these speakers’ renditions of listening comprehension passages in high-stakes English tests, such as the TOEFL, would not be a source of test bias. The phonlogical features in Table 6 can be considered the features that are more likely to be associated with highly intelligible speakers. Table 6: Descriptions of the phonological features of individual speakers rated highly intelligible Speaker Common phonological features American 1 Pace (the average number of prominent syllables per run) = 2.72 Falling tone = 61 per cent; level tone = 0 per cent; rising tone = 39 per cent American 2 Pace (the average number of prominent syllables per run) = 2.37 Falling tone = 39 per cent; level tone =2 per cent; rising tone = 59 per cent American 3 Pace (the average number of prominent syllables per run) = 2.48 Falling tone = 59 per cent; level tone = 6 per cent; rising tone = 35 per cent British 1 Pace (the average number of prominent syllables per run) = 2.53 Falling tone = 51 per cent; level tone = 0 per cent; rising tone = 49 per cent /ɔ/→/ow/ (1) Absence of flap (various times) Absence of postvocalic r (various times) /ɑ/→/ ɔ/ (3) /æ/→/ɑ/ (various times) Addition of /h/ (1) Minor word stress (1) British 2 Pace (the average number of prominent syllables per run) = 2.15 Falling tone = 54 per cent; level tone = 2 per cent; rising tone = 44 per cent Word stress (1) /æ/→/ɑ/ (various times) British 3 Pace (the average number of prominent syllables per run) = 2.04 Falling tone = 48 per cent; level tone = 4 per cent; rising tone = 28 per cent Absence of postvocalic r (various times) /ɔ/→/ow/ (1) Absence of flap (various times) / ɑ/→/ ɔ/ (1) South African 3 Pace (the average number of prominent syllables per run) = 2.41 Falling tone = 52 per cent; level tone = 12 per cent; rising tone = 36 per cent Word stress (2) Absence of flap (various times) Absence of postvocalic r (various times) Trilled r (various times) Indian 3 Pace (the average number of prominent syllables per run) = 2.81 Falling tone = 63 per cent; level tone = 0 per cent; rising tone = 37 per cent /s/→/z/ (1) /w/→/v/ (1) Word stress (1) Absence of the flap (various times) Spanish 3 Pace (the average number of prominent syllables per run) = 2.30 Falling ton e = 57 per cent; level tone = 3 per cent; rising tone = 40 per cent /I/→/iy/ (1) /z/→/s/ (2) / ð/→/d/ (1) / ɑ/→/ow/ (1) Chinese 3 Pace (the average number of prominent syllables per run) = 2.23 Falling tone = 52 per cent; level tone = 0 per cent; rising tone = 44 per cent /I/ /→ /iy/ (1) /ɛ/→ /iy/ (1) Speaker Common phonological features American 1 Pace (the average number of prominent syllables per run) = 2.72 Falling tone = 61 per cent; level tone = 0 per cent; rising tone = 39 per cent American 2 Pace (the average number of prominent syllables per run) = 2.37 Falling tone = 39 per cent; level tone =2 per cent; rising tone = 59 per cent American 3 Pace (the average number of prominent syllables per run) = 2.48 Falling tone = 59 per cent; level tone = 6 per cent; rising tone = 35 per cent British 1 Pace (the average number of prominent syllables per run) = 2.53 Falling tone = 51 per cent; level tone = 0 per cent; rising tone = 49 per cent /ɔ/→/ow/ (1) Absence of flap (various times) Absence of postvocalic r (various times) /ɑ/→/ ɔ/ (3) /æ/→/ɑ/ (various times) Addition of /h/ (1) Minor word stress (1) British 2 Pace (the average number of prominent syllables per run) = 2.15 Falling tone = 54 per cent; level tone = 2 per cent; rising tone = 44 per cent Word stress (1) /æ/→/ɑ/ (various times) British 3 Pace (the average number of prominent syllables per run) = 2.04 Falling tone = 48 per cent; level tone = 4 per cent; rising tone = 28 per cent Absence of postvocalic r (various times) /ɔ/→/ow/ (1) Absence of flap (various times) / ɑ/→/ ɔ/ (1) South African 3 Pace (the average number of prominent syllables per run) = 2.41 Falling tone = 52 per cent; level tone = 12 per cent; rising tone = 36 per cent Word stress (2) Absence of flap (various times) Absence of postvocalic r (various times) Trilled r (various times) Indian 3 Pace (the average number of prominent syllables per run) = 2.81 Falling tone = 63 per cent; level tone = 0 per cent; rising tone = 37 per cent /s/→/z/ (1) /w/→/v/ (1) Word stress (1) Absence of the flap (various times) Spanish 3 Pace (the average number of prominent syllables per run) = 2.30 Falling ton e = 57 per cent; level tone = 3 per cent; rising tone = 40 per cent /I/→/iy/ (1) /z/→/s/ (2) / ð/→/d/ (1) / ɑ/→/ow/ (1) Chinese 3 Pace (the average number of prominent syllables per run) = 2.23 Falling tone = 52 per cent; level tone = 0 per cent; rising tone = 44 per cent /I/ /→ /iy/ (1) /ɛ/→ /iy/ (1) Note. The number or percentage after each feature signifies number of occurrences. Table 6: Descriptions of the phonological features of individual speakers rated highly intelligible Speaker Common phonological features American 1 Pace (the average number of prominent syllables per run) = 2.72 Falling tone = 61 per cent; level tone = 0 per cent; rising tone = 39 per cent American 2 Pace (the average number of prominent syllables per run) = 2.37 Falling tone = 39 per cent; level tone =2 per cent; rising tone = 59 per cent American 3 Pace (the average number of prominent syllables per run) = 2.48 Falling tone = 59 per cent; level tone = 6 per cent; rising tone = 35 per cent British 1 Pace (the average number of prominent syllables per run) = 2.53 Falling tone = 51 per cent; level tone = 0 per cent; rising tone = 49 per cent /ɔ/→/ow/ (1) Absence of flap (various times) Absence of postvocalic r (various times) /ɑ/→/ ɔ/ (3) /æ/→/ɑ/ (various times) Addition of /h/ (1) Minor word stress (1) British 2 Pace (the average number of prominent syllables per run) = 2.15 Falling tone = 54 per cent; level tone = 2 per cent; rising tone = 44 per cent Word stress (1) /æ/→/ɑ/ (various times) British 3 Pace (the average number of prominent syllables per run) = 2.04 Falling tone = 48 per cent; level tone = 4 per cent; rising tone = 28 per cent Absence of postvocalic r (various times) /ɔ/→/ow/ (1) Absence of flap (various times) / ɑ/→/ ɔ/ (1) South African 3 Pace (the average number of prominent syllables per run) = 2.41 Falling tone = 52 per cent; level tone = 12 per cent; rising tone = 36 per cent Word stress (2) Absence of flap (various times) Absence of postvocalic r (various times) Trilled r (various times) Indian 3 Pace (the average number of prominent syllables per run) = 2.81 Falling tone = 63 per cent; level tone = 0 per cent; rising tone = 37 per cent /s/→/z/ (1) /w/→/v/ (1) Word stress (1) Absence of the flap (various times) Spanish 3 Pace (the average number of prominent syllables per run) = 2.30 Falling ton e = 57 per cent; level tone = 3 per cent; rising tone = 40 per cent /I/→/iy/ (1) /z/→/s/ (2) / ð/→/d/ (1) / ɑ/→/ow/ (1) Chinese 3 Pace (the average number of prominent syllables per run) = 2.23 Falling tone = 52 per cent; level tone = 0 per cent; rising tone = 44 per cent /I/ /→ /iy/ (1) /ɛ/→ /iy/ (1) Speaker Common phonological features American 1 Pace (the average number of prominent syllables per run) = 2.72 Falling tone = 61 per cent; level tone = 0 per cent; rising tone = 39 per cent American 2 Pace (the average number of prominent syllables per run) = 2.37 Falling tone = 39 per cent; level tone =2 per cent; rising tone = 59 per cent American 3 Pace (the average number of prominent syllables per run) = 2.48 Falling tone = 59 per cent; level tone = 6 per cent; rising tone = 35 per cent British 1 Pace (the average number of prominent syllables per run) = 2.53 Falling tone = 51 per cent; level tone = 0 per cent; rising tone = 49 per cent /ɔ/→/ow/ (1) Absence of flap (various times) Absence of postvocalic r (various times) /ɑ/→/ ɔ/ (3) /æ/→/ɑ/ (various times) Addition of /h/ (1) Minor word stress (1) British 2 Pace (the average number of prominent syllables per run) = 2.15 Falling tone = 54 per cent; level tone = 2 per cent; rising tone = 44 per cent Word stress (1) /æ/→/ɑ/ (various times) British 3 Pace (the average number of prominent syllables per run) = 2.04 Falling tone = 48 per cent; level tone = 4 per cent; rising tone = 28 per cent Absence of postvocalic r (various times) /ɔ/→/ow/ (1) Absence of flap (various times) / ɑ/→/ ɔ/ (1) South African 3 Pace (the average number of prominent syllables per run) = 2.41 Falling tone = 52 per cent; level tone = 12 per cent; rising tone = 36 per cent Word stress (2) Absence of flap (various times) Absence of postvocalic r (various times) Trilled r (various times) Indian 3 Pace (the average number of prominent syllables per run) = 2.81 Falling tone = 63 per cent; level tone = 0 per cent; rising tone = 37 per cent /s/→/z/ (1) /w/→/v/ (1) Word stress (1) Absence of the flap (various times) Spanish 3 Pace (the average number of prominent syllables per run) = 2.30 Falling ton e = 57 per cent; level tone = 3 per cent; rising tone = 40 per cent /I/→/iy/ (1) /z/→/s/ (2) / ð/→/d/ (1) / ɑ/→/ow/ (1) Chinese 3 Pace (the average number of prominent syllables per run) = 2.23 Falling tone = 52 per cent; level tone = 0 per cent; rising tone = 44 per cent /I/ /→ /iy/ (1) /ɛ/→ /iy/ (1) Note. The number or percentage after each feature signifies number of occurrences. Based on the divergence frequency of vowels or consonants in each recording of the entire passage among those highly intelligible speakers, we calculated the divergence rates. The phonological features above demonstrate that divergence rates for vowels in content words for intelligible speakers are below 4.1 per cent (e.g. less than 5 divergences out of 120 content words) and divergence rates for consonants in content words below 2.5 per cent (e.g. less than 3 divergences out of 120 content words). Then, we could look at the maximum possible divergence cases for each phonological category. For example, to set up a divergence rate for vowels, we examined vowel divergence occurrences across all 10 speakers. A British Speaker #1 had five divergences, a Mexican Speaker #3 two divergences, a Chinese Speaker #3 two divergences, and so forth. Five divergences for vowels found from a British speaker appeared to be the highest (most frequent) value found from all speakers. Speakers achieving divergence rates for vowels in content words below 4.1 per cent, divergence rates for consonants in content words below 2.5 per cent, divergence rates for lexical stress below 1.6 per cent (i.e. no more than 2 stress divergences out of 120 content words), and divergence rates for number of prominent words in content words (pace = 2.72 per cent) can be considered intelligible particularly in the assessment context of the current study. For vowels and consonants, the divergences from high functional loads may not exceed 1.6 per cent (i.e. 2 errors out of 120 words). For tone choices, falling tones should be used no more than 63 per cent, but rising tones more than 37 per cent. 7. DISCUSSION We examined the extent to which variability in three clusters of phonetic/phonological features commonly associated with intelligible and comprehensible speech might predict listeners’ comprehension scores on the TOEFL listening test and the intelligibility test scores. These clusters were defined as segmental, prosodic, and fluency features. Among segmental features, divergences in high functional load vowels, along with divergences in both high and low functional load consonants, consonant deletion, syllable reduction, and consonant cluster divergence were predictors of both listener comprehension and intelligibility scores. The results about functional load not only support previous research but also provide experimental evidence validating Catford (1987) and Brown’s (1991) claims regarding functional load. Beyond supporting the notion of functional load, our results also agree with Bent, Bradlow, and Smith (2007), who found that many consonant and vowel errors played a role in intelligibility. Probing individual consonants in more detail, our results are also similar to Magen (1998), who found that some consonant divergence matters to intelligibility—again, possibly related to functional load. With respect to consonant divergence, it is also worth noting that its impact on intelligibility differs depending on the consonant error’s position in a word (Bent et al. 2007); word-initial consonant divergence is more detrimental to intelligibility than is consonant divergence in other positions. While our data did not lend itself to a similar analysis, it is advisable to consider this constraint when determining the severity of particular consonant divergence in the development of test listening passages spoken by varied English accents. We examined selected suprasegmental features associated with intonation, stress, and rhythm. Of these, the appropriate use of rising tone, tone unit, and rhythm clustered, as enhancing prosody markers were found to be strong predictors of our participants’ listening comprehension scores. At the same time, lexical stress and the inappropriate use of falling tone significantly predicted intelligibility scores. These findings are highly consistent with previous studies. For example, Wennerstrom (2000) and Kang (2010) have convincingly demonstrated that accurate lexical stress is particularly consequential for intelligibility, relative to other suprasegmental error types. Furthermore, divergence in lexical stress assignment has been shown to cause listeners difficulty in comprehension of oral texts (Hahn, 2004). With respect to the appropriate use of rising tone, Kang (2012) argues that patterns associated with particular English accents can cause listeners from other groups to have difficulty paying attention, or even lead to listeners’ misunderstanding of speakers’ intent. More specific to language assessment, Kang (2010) found that when there is a mismatch between listener expectations and speech rhythm, the speaker’s comprehensibility declines, along with listener perceptions of the speaker’s oral proficiency. The temporal fluency measures we calculated from the listening samples were also found to significantly predict listener comprehension and intelligibility scores, but the strength of this relationship was mixed, and therefore may lack importance. Impeding fluency significantly predicted listener comprehension, but did not predict intelligibility scores strongly. On the other hand, enhancing fluency predicted intelligibility scores significantly, but not listener comprehension. These mixed findings in the relationship between temporal measures and comprehension and intelligibility should not be taken to imply that temporal aspects of speech are unimportant. Rather, the nature of our stimuli was such that temporal features across speech samples were homogenous, idealized, and controlled. All of our speakers who provided recordings for the listening passages had been given the opportunity to rehearse the speech samples ahead of time and were asked to re-record them in any cases where the speech rate did not fall within the normally expected TOEFL range. This strict control might serve as a methodological limitation if we were investigating intelligibility in a natural context, precluding a more detailed analysis of the role of temporal measures in listening comprehension. However, it is justifiable for our target context, in which these measures (i.e. rehearsing, controlling for speech rate, etc.) are likely to be taken with prospective lecture readers. The results confirm that when listening passages are spoken with unfamiliar accents, as long as the normal speech rate expected for the listening test is controlled, test developers can be confident that any loss of comprehension for a particular speech sample is attributable to segmental and/or suprasegmental divergence, and not temporal features. Highly intelligible speakers discussed in Table 6 above rarely showed the divergences of consonant deletion, syllable reduction, and consonant cluster error. Then, their speech did include certain segmental divergences involving high functional vowels, low functional vowels, and high functional consonants. However, their occurrence of such divergence was largely sparse and limited (i.e. once or twice of the occurrence through the entire speech). Therefore, we can cautiously state that speakers who are recruited to produce spoken materials for a high-stakes listening comprehension test should demonstrate very few segmental divergences in content words over the entire high-stakes listening passage. For example, TOEFL passages comprise approximately 120 content words. The highly intelligible speakers in our study who we believe would be acceptable candidates for producing spoken test materials produced two to five segmental divergences within high functional load vowels and consonants. This leads to about 1.6 to 4 per cent of divergence rates. We recommend discounting segmental divergence in function words, which do not lead to a decrease in passage comprehensibility. Ultimately, our attempt to identify which segmental, suprasegmental, and temporal features are most deleterious to listener comprehension and intelligibility scores was complicated by the large number of interrelations between features typically associated with each category of divergence. It seems unlikely that any particular divergence type impacts intelligibility and comprehension scores in isolation, but rather, the nature of the divergence, combined with where it occurs in relation to the proposition of an utterance, among other listener variables, interact in determining the severity of particular divergence, and the overall comprehensibility of a particular listening passage. Despite these obvious complexities, by identifying those features that are most likely to impact comprehension of listening passages spoken by a variety of English accents, our results provide a reasoned basis for excluding particular speakers as candidates from whom to obtain assessment-related speech samples due to low intelligibility. The current findings inform future research on phonological divergence of highly intelligible speech and their relationship with listening performance. Overall, test development can be informed by these results. Based on our findings, we would recommend that if test developers are to include English varieties beyond American and British, they should follow several guidelines. First, in terms of phonological features in listening passages themselves, even though generalizations should be made very carefully, highly intelligible speakers (see Table 5) are characterized as rarely exhibiting consonant deletion, inappropriate syllable reduction, and divergence in the pronunciation of consonant clusters relative to American English norms. The characteristics of these speakers can be analogous to those of speakers who showed very weak strength of accent in Ockey and French (2016). Speakers who do exhibit these segmental divergences should be avoided in English assessments. Second, the speech of highly intelligible speakers can include certain segmental divergence involving high functional vowels, low functional vowels, and high functional consonants. However, the occurrence of such divergence is very limited (i.e. once or twice through the entire TOEFL-type listening text). Therefore, speakers likewise should exhibit few divergences of this nature. Last, we recommend treating segmental divergence in function words, regardless of the segment’s functional load, as less egregious, since divergence in function words is not expected to lead to a decrease in passage comprehensibility. Therefore, when developing tests, a speaker should not be automatically rejected if their speech contains some divergence. However, the contextualized speech (i.e. the passage read by the target speaker) must be carefully analyzed for the types and places of divergence. Test developers should also attempt to avoid errors that occur in a piece of information that is critical for an item, because even one error in an otherwise accurate reading could affect a listener’s answer choice if that error involves a crucial word or phrase. While this may be more time-consuming and not always predictable with complete accuracy, it is a necessary step to be able to increase ecological validity of large-scale English tests while simultaneously avoiding bias based on speaker accent. 8. Conclusion We believe that the most significant impact of the present study is that it provides a rational, systematic approach to selecting speakers with a variety of English accents for inclusion as model speakers in English listening tests. The resulting approach to delineating a threshold of intelligibility, while preliminary, has the potential to take much of the guesswork out of selecting speakers of what have been traditionally considered non-standard English accents to provide speech samples that are most likely to meet test requirements. In this study, despite speaking with noticeably non-standard accents, the four most intelligible speakers with South African, Indian, Spanish, and Chinese accents produced speech samples that did not negatively impact listening test scores. Our findings can be further utilized by language teachers to enhance English learners’ communicative success. Despite the limited speakers used in this study, teachers can make informed decisions for their pronunciation instruction by identifying which features of speech are most likely to affect intelligibility and comprehensibility (i.e. Table 5) for listeners, teachers, and learners can be better able to prioritize where to focus their pronunciation instruction. However, it must be noted that this was a script-reading task, which is inherently different from spontaneous communication. To influence pronunciation pedagogy, further research should be conducted on features of accented speech that affect intelligibility and comprehensibility of conversational English. Finally, we acknowledge several limitations to our study. The threshold of intelligibility that we have tentatively attempted to establish in this study should be interpreted with caution given the limited number of speakers, listeners, and English varieties represented. Future research is needed to validate the current findings with larger samples representing more varied populations. It should be also noted that accent, investigated in the current study, is only one dimension of difference across international varieties of English. Vocabulary and grammar could impact test scores if they had not been controlled. In addition, the current study was limited to listeners who were already highly proficient in English, and skilled at taking the high-stakes test in particular. Additional research is called for to examine test-takers who represent a wider range of proficiency levels and testing experience. NOTES ON CONTRIBUTORS Okim Kang is an Associate Professor in the program of applied linguistics at Northern Arizona University, Flagstaff, AZ, USA. Her research interests are speech production and perception, L2 pronunciation and intelligibility, L2 oral assessment and testing, automated scoring and speech recognition, world Englishes, and language attitude. She has recently published two edited books: The Routledge Handbook of Contemporary English Pronunciation and Assessment in Second Language Pronunciation. Address for correspondence: Okim Kang, Department of English, Northern Arizona University, Liberal Arts Building 18, Room 140, Flagstaff, AZ 86011-6032, USA. . Ron I. Thomson is a Professor of Applied Linguistics at Brock University. His research focuses on the development of L2 pronunciation and oral fluency. He is also interested in how computer-mediated instruction can facilitate easier and more rapid development of L2 speech perception and production, and has developed www.englishaccentcoach.com, a freely available High Variability Pronunciation Training (HVPT) app for L2 English learners. Meghan Moran is an Instructor in the English Department at Northern Arizona University, Flagstaff, AZ, USA. Her research interests include speech production and perception, L2 pronunciation and intelligibility, language planning and policy, language education policy, and linguistic discrimination. Meghan has recently co-authored studies with Okim Kang and Ron I. Thomson on second language intelligibility and the inclusion of accented varieties of English in high-stakes assessment, which can be found in TESOL Quarterly and Language Learning. Footnotes 1 It is important to note that work within the World Englishes paradigm includes attention to variation in lexis, grammar, and pragmatics, in addition to pronunciation. For the purposes of the present discussion, we are only concerned with differences in pronunciation, not with these other dimensions of difference. FUNDING This research was funded by the Educational Testing Service (ETS) under a Committee of Examiners and the TOEFL research grant. ETS does not discount or endorse the methodology, results, implications, or opinions presented by the researchers. Conflict of interest statement. None declared. References Abeywickrama P. 2013 . ‘ Why not non-native varieties of English as listening comprehension test input? ,’ RELC Journal 44 : 59 – 74 . Google Scholar Crossref Search ADS Anderson-Hsieh J. , Venkatagiri H. . 1994 . ‘ Syllable duration and pausing in the speech of Chinese ESL speakers ,’ TESOL Quarterly 28 : 808 – 14 . Google Scholar Crossref Search ADS Avery P. , Ehrlich S. 2008 . Teaching American English Pronunciation . Oxford University Press . Bent T. , Bradlow A. R. , Smith B. L. . 2007 . ‘Segmental errors in different word positions and their effects on intelligibility of non-native speech: All's well that begins well’ in Bohn O. S. , Munro M. J. (eds): Second-Language Speech Learning: The Role of Language Experience in Speech Perception and Production: A Festschrift in Honour of James E. Flege . John Benjamins . Brown A. 1991 . ‘Functional load and the teaching of pronunciation’ in Brown A. (ed.): Teaching English Pronunciation: A Book of Readings . Routledge . Catford J. C. 1987 . ‘Phonetics and the teaching of pronunciation: A systemic description of English phonology,’ in Morley J. (ed.): Current Perspectives on Pronunciation: Practices Anchored in Theory . TESOL . Derwing T. , Munro M. J. . 2001 . ‘ What speaking rates to non-native listeners prefer? ,’ Applied Linguistics 22 : 324 – 37 . Google Scholar Crossref Search ADS Derwing T. M. , Munro M. J. . 1997 . ‘ Accent, intelligibility, and comprehensibility: Evidence from four L1s ,’ Studies in Second Language Acquisition 19 : 1 – 16 . Google Scholar Crossref Search ADS Derwing T. M. , Rossiter M. J. , Munro M. J. , Thomson R. I. . 2004 . ‘ Second language fluency: Judgments on different tasks ,’ Language Learning 54 : 655 – 79 . Google Scholar Crossref Search ADS Elder C. , Harding L. . 2008 . ‘ Language testing and English as an international language: Constraints and contributions ,’ Australian Review of Applied Linguistics 31 : 34.1 – 11 . Google Scholar Crossref Search ADS Faraway J. J. 2005 . Linear Models in R (Texts in Statistical Science) . Chapman and Hall/CRC . Fayer J. M. , Krasinski E. . 1987 . ‘ Native and nonnative judgments of intelligibility and irritation ,’ Language Learning 37 : 313 – 26 . Google Scholar Crossref Search ADS Field J. 2005 . ‘ Intelligibility and the listener: The role of lexical stress ,’ TESOL Quarterly 39 : 399 – 423 . Google Scholar Crossref Search ADS Gass S. M. , Varonis E. M. . 1984 . ‘ The effect of familiarity on the comprehensibility of nonnative speech ,’ Language Learning 34 : 65 – 89 . Google Scholar Crossref Search ADS George D. , Mallery M. . 2010 . SPSS for Windows Step by Step: A Simple Guide and Reference, 17.0 Update . Pearson . Gimson A. C. 1980 . An Introduction to the Pronunciation of English , 3 rd edn Edward Arnold . Hahn L. D. 2004 . ‘ Primary stress and intelligibility: Research to motivate the teaching of suprasegmentals ,’ TESOL Quarterly 38 : 201 – 23 . Google Scholar Crossref Search ADS Hamp-Lyons L. , Davies A. . 2008 . ‘ The English of English tests: Bias revisited ,’ World Englishes 27 : 27 – 39 . Google Scholar Crossref Search ADS Harding L. 2011 . Accent and Listening Assessment: A Validation Study of the Use of Speakers with L2 Accents on an Academic English Listening Test . Peter Lang . Iwashita N. , Brown A. , McNamara T. , O'Hagan S. . 2008 . ‘ Assessed levels of second language speaking proficiency: How difficult? ,’ Applied Linguistics 29 : 24 – 49 . Google Scholar Crossref Search ADS Jenkins J. 2003 . World Englishes: A Reference Book for Students . Routledge . Jenkins J. 2006 . ‘ The spread of EIL: A testing time for testers ,’ English Language Teaching Journal 60 : 42 – 50 . Google Scholar Crossref Search ADS Kachru B. B. 1992 . The Other Tongue: English across Cultures . University of Illinois Press . Kang O. 2010 . ‘ Relative salience of suprasegmental features on judgments of L2 comprehensibility and accentedness ,’ System 38 : 301 – 15 . Google Scholar Crossref Search ADS Kang O. 2012 . ‘ Impact of rater characteristics on ratings of international teaching assistant’s oral performance ,’ Language Assessment Quarterly 9 : 1 – 21 . Google Scholar Crossref Search ADS Kang O. , Rubin D. . 2009 . ‘ Reverse linguistic stereotyping: Measuring the effect of listener expectations on speech evaluation ,’ Journal of Language and Social Psychology 28 : 441 – 56 . Google Scholar Crossref Search ADS Kang O. , Rubin D. , Pickering L. . 2010 . ‘ Suprasegmental measures of accentedness and judgments of language learner proficiency in oral English ,’ The Modern Language Journal 94 : 554 – 66 . Google Scholar Crossref Search ADS Kang O. , Thomson R. I. , Moran M. . 2018a . ‘ Empirical approaches to measuring the intelligibility of different varieties of English in predicting listener comprehension ,’ Language Learning 68 : 115 – 46 . Google Scholar Crossref Search ADS Kang O. , Thomson R. , Moran M. . 2018b . ‘The effects of international accents and shared first language on listening comprehension tests TESOL Quarterly . doi: 10.1002/tesq.463 Kormos J. , Denes M. . 2004 . ‘ Exploring measures and perceptions of fluency in the speech of second language learners ,’ System 32 : 145 – 64 . Google Scholar Crossref Search ADS Li D. C. S. 2009 . ‘Researching non-native speakers’ views toward intelligibility and identity: Bridging the gap between moral high grounds and down-to-earth concerns,’ in Sharifian’s F. (ed.): English as an International Language: Perspectives and Pedagogical Issues . Multilingual Matters . Linacre J. M. , Wright B. D. . 2002 . ‘ Construction of measures from many-facet data ,’ Journal of Applied Measurement 3 : 484 – 509 . Lippi-Green R. 2012 . English with an Accent: Language, Ideology and Discrimination in the United States . Routledge . Lunz M. E. , Stahl J. A. . 1990 . ‘ Judge consistency and severity across grading periods ,’ Evaluation and the Health Professions 13 : 425 – 44 . Google Scholar Crossref Search ADS Magen H. 1998 . ‘ The perception of foreign-accented speech ,’ Journal of Phonetics 26 : 381 – 400 . Google Scholar Crossref Search ADS Major R. C. , Fitzmaurice S. F. , Bunta F. , Balasubramanian C. . 2002 . ‘ The effects of nonnative accents on listening comprehension: Implications for ESL assessment ,’ TESOL Quarterly 36 : 173 – 90 . Google Scholar Crossref Search ADS McNamara T. 1996 . Measuring Second Language Performance . Addison Wesley Longman . Munro M. , Derwing T. . 1995 . ‘ Processing time, accent, and comprehensibility in the perception of native and foreign-accented speech ,’ Language and Speech 38 : 289 – 306 . Google Scholar Crossref Search ADS PubMed Munro M. J. , Derwing T. M. . 2006 . ‘ The functional load principle in ESL pronunciation instruction: An exploratory study ,’ System 34 : 520 – 31 . Google Scholar Crossref Search ADS Myford C. M. , Wolfe E. W. . 2004 . ‘ Detecting and measuring rater effects using many-facet Rasch measurement: Part II ,’ Journal of Applied Measurement 5 : 189 – 227 . Google Scholar PubMed Nakagawa S. , Schielzeth H. . 2013 . ‘ A general and simple method for obtaining R2 from generalized linear mixed‐effects models ,’ Methods in Ecology and Evolution 4 : 133 – 42 . Google Scholar Crossref Search ADS Nelson C. L. 2011 . Intelligibility in World Englishes: Theory and Application . Routledge . Nye P. W. , Gaitenby J. H. . 1974 . ‘The intelligibility of synthetic, monosyllabic words in short, syntactically normal sentences,’ Haskins Laboratories Status Report on Speech Research, SR-37/38, pp. 169 – 90 . Ockey G. J. , French R. . 2016 . ‘ From one to multiple accents on a test of L2 listening comprehension ,’ Applied Linguistics 37 : 693 – 715 . Google Scholar Crossref Search ADS Ockey G. J. , Papageorgiou S. , French R. . 2016 . ‘ Effects of strength of accent on an L2 interactive lecture listening comprehension test ,’ International Journal of Listening . 30 : / 84 – 98 . Google Scholar Crossref Search ADS Picheny M. A. , Durlach N. I. , Braida L. D. . 1985 . ‘ Speaking clearly for the hard of hearing: Intelligibility differences between clear and conversational speech ,’ Journal of Speech and Hearing Research 28 : 96 – 103 . Google Scholar Crossref Search ADS PubMed Pickering L. 2001 . ‘ The role of tone choice in improving ITA communication in the classroom ,’ TESOL Quarterly 35 : 233 – 55 . Google Scholar Crossref Search ADS Rubin D. L. 1992 . ‘ Nonlanguage factors affecting undergraduates’ judgments of non-native English-speaking teaching assistants ,’ Research in Higher Education 33 : 511 – 31 . Google Scholar Crossref Search ADS Smith L. , Nelson C. . 1985 . ‘ International intelligibility of English: Directions and resources ,’ World Englishes 4 : 333 – 42 . Google Scholar Crossref Search ADS Tavakoli P. , Skehan P. . 2005 . ‘Strategic planning, task structure, and performance testing,’ in Ellis R. (ed.): Planning and Task Performance in a Second Language . John Benjamins Publishing Company . Taylor L. 2006 . ‘ The changing landscape of English: Implications for language assessment ,’ English Language Teaching Journal 60 : 51 – 60 . Google Scholar Crossref Search ADS Taylor L. , Geranpayeh A. . 2011 . ‘ Assessing listening for academic purposes: Defining and operationalizing the test construct ,’ Journal of English for Academic Purposes 10 : 89 – 101 . Google Scholar Crossref Search ADS Thomson R. I. 2015 . ‘Fluency,’ in Reed M. , Levis J. (eds): The Handbook of Pronunciation . Wiley . TOEFL iBT Test Framework and Test Development . 2010 . ETS TOEFL. https://www.ets.org/s/toefl/pdf/toefl_ibt_research_insight.pdf Wennerstrom A. 2000 . ‘The role of intonation in second language fluency,’ in Riggenbach H. (ed.): Perspectives on Fluency . University of Michigan . Yano Y. 2001 . ‘ World Englishes in 2000 and beyond ,’ World English . 20 : 119 – 31 . Google Scholar Crossref Search ADS © The Author(s) (2018). Published by Oxford University Press. All rights reserved. For permissions, please email: journals.permissions@oup.com This article is published and distributed under the terms of the Oxford University Press, Standard Journals Publication Model (https://academic.oup.com/journals/pages/open_access/funder_policies/chorus/standard_publication_model) TI - Which Features of Accent affect Understanding? Exploring the Intelligibility Threshold of Diverse Accent Varieties JO - Applied Linguistics DO - 10.1093/applin/amy053 DA - 2018-12-24 UR - https://www.deepdyve.com/lp/oxford-university-press/which-features-of-accent-affect-understanding-exploring-the-Vpy0Dbl1hx SP - 1 VL - Advance Article IS - DP - DeepDyve ER -