Usability of American Sign Language Videos for Presenting Mathematics Assessment Content

Usability of American Sign Language Videos for Presenting Mathematics Assessment Content Abstract There is considerable interest in determining whether high-quality American Sign Language videos can be used as an accommodation in tests of mathematics at both K-12 and postsecondary levels; and in learning more about the usability (e.g., comprehensibility) of ASL videos with two different types of signers – avatar (animated figure) and human. The researchers describe the results of administering each of nine pre-college mathematics items in both avatar and human versions to each of 31 Deaf participants with high school and post-high school backgrounds. This study differed from earlier studies by obliging the participants to rely on the ASL videos to answer the items. While participants preferred the human version over the avatar version (apparently due largely to the better expressiveness and fluency of the human), there was no discernible relationship between mathematics performance and signed version. Current educational assessments of mathematics in the United States are generally in written English, which can create a barrier for deaf and hard of hearing (D/HH) students, many of whom are less proficient in reading English than their hearing peers. Many D/HH students use American Sign Language (ASL) as their primary language. Delivery in ASL has the potential to improve test access for these individuals. Policies regarding the use of ASL for academic assessments vary by educational level (K-12 versus higher education), state, and academic domain (e.g., language arts versus mathematics). Where policy allows ASL test delivery, it may be feasible to present assessment content in high-quality, standardized videos. An important design choice concerns whether the signer in the videos should be human (recorded) or an avatar (animated character). Use of an avatar version may save on development costs relative to a human version, especially if corrections are needed after initial development or if multiple test forms are needed and can be adapted from existing forms. For example, it may be easier to adjust the computer-generated avatar than to reconvene actors and a video production crew to develop and iteratively revise the human version. In addition to such logistical and cost-related considerations, a choice between human signer and avatar signer should be guided by considerations that can impact score validity, notably the usability (e.g., comprehensibility) of the human and avatar videos. Research Literature and Current Practices Background on ASL ASL is a manual/visual language, encoded not in sounds but in the shape, position, movement, and orientation of the hands (Liddell & Johnson, 1989; Stokoe, 1960; Stokoe, Casterline, & Croneberg, 1965), as well as in gaze, facial expression, and body position. ASL is grammatically distinct from English or any other spoken language, having evolved spontaneously among a community of users, as have all other natural languages. Due to federal and state laws such as the Individuals with Disabilities Education Improvement Act (IDEIA) and the Elementary and Secondary Education Act (ESEA), policies about the use of ASL as an assessment accommodation are becoming more standardized across K-12 schools. Computer-delivered tests at the K-12 level for two Race to the Top Assessment (RttTA) consortia (PARCC and Smarter Balanced) offer recorded human signers in operational mathematics tests (Partnership for Assessment of Readiness for College and Careers, 2017; p. 31; Smarter Balanced Assessment Consortium, 2017, p. 18). Such accommodations may be helpful for reducing construct-irrelevant variance associated with the reading skill levels of D/HH students. Policy in Higher Education Assessments Prior to the passage of the Americans with Disabilities Act (ADA) in 1990, most D/HH students entering higher education headed to dedicated programs such as Gallaudet University or the National Technical Institute for the Deaf. As the ADA Amendments Act (ADA AA) of 2008 and other legislation expanded opportunities for individuals with disabilities to enter institutions of higher education, increasing numbers of D/HH students began enrolling in mainstream colleges, universities, and vocational programs. There is currently no uniform policy across institutions of higher education regarding ASL interpretation of assessments. The primary operative law at the postsecondary level is the ADA AA, which is not specifically focused on education, encompassing equal opportunity issues in employment and in government and commercial services as well. Disability service providers at different institutions of higher education tend to craft their own policies. These policies are loosely coordinated by listservs and through the participation of many United States campus disability service providers in the Association on Higher Education and Disability (AHEAD). AHEAD’s recommendations, however, serve as guidance and do not have the force of law. A D/HH postsecondary student requesting ASL interpretation commonly receives this accommodation for instructional purposes, subject to factors such as availability, the feasibility of alternative accommodations (e.g., real-time captioning), and course or program content. Policies toward interpretation of course-based tests, however, vary tremendously from one institution to another and from course to course within a single institution. Sometimes the variation is due to differing course requirements. For example, a nurses’ training program may require students to know medical terminology in English so that they can read, and quickly respond to, entries in patients’ charts or instructions from physicians. Translating an assessment in that domain into another language would alter the construct. Other courses may be less vocabulary-based but nonetheless include content that is dependent on written language. Similarly, on high-stakes standardized tests, states and testing agencies differ in their policies about ASL interpretation of test content, which, generally speaking, is less likely to be permitted as an accommodation on graduate admissions and professional licensure tests than for postsecondary courses, due to concerns with standardization, consistency and quality of translation, and test security (Johnson, Kimball, & Brown, 2001; Mason, 2005). Two Key Studies of Computer-based Delivery of ASL in Assessment The research literature on ASL translation of tests focuses almost entirely on the K-12 level. Cawthon et al. (2011) sought to measure the effects of an ASL accommodation for D/HH students in reading and mathematics assessments. Sixty-four fifth to eighth grade students (ages 10–15) participated. Each participant was exposed to both the standard (control) and ASL (experimental) conditions. In the standard condition, the directions were translated into ASL, but the reading passages, reading test items, and mathematics test items were provided only in written English. In the ASL condition, those same materials were provided both in ASL (via DVD) and in written English. The researchers found no overall differences in the mean percent of items for which students scored correctly in either reading or mathematics, noting that “It is possible that some of the readers… did not use the ASL accommodation as intended either because they did not attend to the accommodation or because they relied more on their reading skills…” (p. 207). Thus, the study did not reveal the extent to which participants actually relied on the signing. Russell et al. (2009) compared recorded human and avatar signers in the context of a computer-based signing accommodation using released NAEP grade 8 mathematics items with 96 D/HH students in grades 8 through 12. While using the human and avatar videos, participants had access to the English items. The researchers found that on a four-option Likert scale (strongly agree, agree, disagree, strongly disagree), 60% of participants agreed or strongly agreed that “It was easy to understand information presented by the signing avatar,” while 83% of participants agreed or strongly agreed that “It was easy to understand information presented by the signing human.” Notwithstanding this preference for human over avatar, there was no significant difference in mathematics performance based on the signed version. This study did not report which aspect of the presentation (English print or computer-based signing) the student relied on to comprehend and problem-solve the mathematics items. Therefore, the lack of group differences could be due to students accessing all available information to problem-solve. A later study by Higgins et al. (2016) did show better performance in mathematics questions by students in grades 3 through 12 when supported by videos of fingerspelling and/or ASL signers, when compared to a condition without such supports. Desirable Features of Additional Research Given the paucity of research in this area, and the limitations of previous research as discussed here regarding translation issues and reliance on signed versions, as well as the increasing requests for ASL test translations, there is a need to continue exploration of the issue of avatar signing versus human signing with a study design that addresses issues that may affect either the validity or the practicality of signed versions in operational mathematics testing. Features that would be desirable in a follow-up study would include the following (all of which the study described in this paper was intended to address): Implementing a testing situation in which participants need to rely on the signing (human or avatar) rather than the English text. Russell et al. (2009) and Cawthon et al. (2011) provided access to the English item for the whole time that the student interacted with the item, which meant that their mathematics performance was not a measure of the comprehensibility of the signing itself. Participants may or may not have actually depended on the signing. A well-controlled study is needed to clarify the usefulness of signed versions in operational mathematics testing. Translating into sign language only the English-heavy features of the item, since one rationale for the signing accommodation for D/HH individuals who use ASL is to bypass the English literacy challenges many of these individuals face. For example, one might consider not signing mathematics expressions, answer options, and/or graphics, since these are largely English-free. Russell et al. (2009) and Cawthon et al. (2011) translated full items. Supplementing data on mathematics scores with other indicators of comprehension, for example, having the participant translate items into English (in a context in which they need to rely on the signed versions and have not been exposed already to the English text of the items). By having an indicator that may not be as closely bound to mathematics proficiency, one may support better inferences about the comprehensibility of the signing. Other changes are important to consider, including expanding the sample to include post-high school participants (to make inferences more relevant to higher-education settings) and ensuring the use of a human signer who is RID1-certified with experience interpreting in the educational system. (Russell et al. (2009) did not specify the human signer’s qualifications.) Purpose of this Study The purpose of the current study was to compare the usability of human and avatar videos for Deaf individuals at the high school and post-high school level using pre-college mathematics items, shedding light on some issues still in need of clarification. For purposes of this report, upper-case “Deaf” refers to a deaf or hard of hearing individual who uses ASL. In this study, researchers sought to address the research needs mentioned above, among them a research design in which participants needed to rely on the signed (human or avatar) presentation of items rather than only on the English text. It should be emphasized that the goal of the current project was not to evaluate the usefulness of an ASL translation (versus no translation), but to compare the human and avatar versions. The research questions for the study are as follows: Which signed version is most comprehensible to Deaf participants? To what degree do Deaf participants prefer a human- or avatar-signed version of mathematics test questions? What kinds of usability challenges do Deaf participants experience when using the avatar and human presentation versions? What suggestions do Deaf participants make for improving the usability of signed versions? Is there an association between the reported age of starting to learn ASL and either mathematics score or translation score? Is there evidence that participants benefited from a second signed version or from the full English version? What is the relationship between mathematics score and translation score? What is the relationship between translation score and signed version? Are there indications about the adequacy of the full-English version by itself? Is there a relationship between highest mathematics course taken and total mathematics score? Method Research Design The study was a repeated measures design in which each participant was presented with and responded to each of 10 single-selection multiple choice math items in three conditions – human-signed, avatar-signed, and non-signed. In general, the math score obtained under each condition was taken as an indicator of the usability (e.g., comprehensibility) of that condition. For each of the signed conditions (human [prerecorded] and avatar), participants were asked to translate the item into English. These translations were later rated by experts for their quality in conveying the meaning of the English item. Thus, the translation score was another potential indicator of the usability (e.g., comprehensibility) of the signed versions. Presentation and response formats There was a separate session for each participant. For the two signed conditions – human (prerecorded) and avatar – the item was presented on the computer screen. Specifically, on the right side of the screen was the “partial” English math item, that is, in which the English-heavy part of the test item was grayed out (and therefore not visible) and on the left side of the screen was a window for the video of the signer (human or avatar). This is shown in Figure 1, without the navigational controls. Figure 1 View largeDownload slide The video window (avatar) on the left and the partial English version on the right. Screen shot of the computer screen. The left side of the screen is for the signed videos (human or avatar). In this case it is the avatar. On the right side is the partial English math item. In this case the item shows a picture of a scale with an arrow pointing to a value and there are five choices from which the test taker selects one. Figure 1 View largeDownload slide The video window (avatar) on the left and the partial English version on the right. Screen shot of the computer screen. The left side of the screen is for the signed videos (human or avatar). In this case it is the avatar. On the right side is the partial English math item. In this case the item shows a picture of a scale with an arrow pointing to a value and there are five choices from which the test taker selects one. In the signed videos only the English-heavy portions of the item were signed (thereby potentially overcoming an accessibility barrier by reducing non-construct-relevant demands for knowledge of English). The non-signed condition involved presenting on paper a full English version (nothing grayed out) without any signing as shown in Figure 2. Figure 2 View largeDownload slide The full English version of the item, as it appears on paper. The item is the same one as in Figure 1 (with the arrow and the scale). Figure 2 View largeDownload slide The full English version of the item, as it appears on paper. The item is the same one as in Figure 1 (with the arrow and the scale). Note that in the signed conditions, the participant was obliged to rely on the signing video as well as the partial English version in order to correctly answer the item. For all three conditions of all items, participants recorded their answers on paper. Order of conditions The two signed conditions were always presented before the non-signed conditions (full English version). The order of the signed conditions for a given participant was specified by the form (form A or form B) to which the participant was assigned. Forms A and B Participants were assigned either to form A or form B in alternating fashion as their scheduled session with the researcher occurred (e.g., first participant to form A, second to form B, third to form A, and so on). Table 1 shows the order of human and avatar conditions for the first three items; the same pattern continued for the seven other items. As implied in Table 1, all 10 items were presented in the same order to each participant. Table 1 Order of human and avatar conditions for the first three items for Forms A and B Item number Form A Form B Video 1 Video 2 Video 1 Video 2 1 Avatar Human Human Avatar 2 Human Avatar Avatar Human 3 Avatar Human Human Avatar Item number Form A Form B Video 1 Video 2 Video 1 Video 2 1 Avatar Human Human Avatar 2 Human Avatar Avatar Human 3 Avatar Human Human Avatar Video 1 – the signed condition administered first. Video 2 – the signed conditions administered second. Table 1 Order of human and avatar conditions for the first three items for Forms A and B Item number Form A Form B Video 1 Video 2 Video 1 Video 2 1 Avatar Human Human Avatar 2 Human Avatar Avatar Human 3 Avatar Human Human Avatar Item number Form A Form B Video 1 Video 2 Video 1 Video 2 1 Avatar Human Human Avatar 2 Human Avatar Avatar Human 3 Avatar Human Human Avatar Video 1 – the signed condition administered first. Video 2 – the signed conditions administered second. Thus, in summary, each participant received each item three times in the following order: First signed condition (a.k.a. video 1) (avatar-signed condition or human-signed condition, as specified by the assigned form) Second signed condition (a.k.a. video 2) (the opposite signed condition) Non-signed condition (a.k.a. full English version) Strategy behind the inclusion of second signed condition (video 2) and the third condition (full English version) Obviously, a person’s performance in a given condition could be influenced by earlier conditions. For example, performance for the second signed condition (video 2) for a given item could be influenced by experience in the first signed condition (video 1). And performance in the non-signed condition (full English version) could be influenced by experience with both video 1 and video 2. These could be seen as a form of “contamination.” It would have been possible to have a design that, for a given item, simply used video 1, because it is very arguable that the score obtained from the first condition for a given item would be the purest indicator of ability; performance in later conditions might be affected by exposure to the item in the earlier conditions. Most of the analyses in this study focus in fact on video 1 (rather than video 2). However, by including video 2 in the study as we did allowed us to attempt to detect the influence of video 1 on performance for video 2. Furthermore, inclusion of the non-signed condition (i.e., using the full English version; shown in Figure 2) was to provide, for comparison, a situation that would arguably be relatively optimal for performance, that is, with the participant having access to the full English version of the item (nothing grayed out) as well as having already seen both the signed versions. Thus, the inclusion of the video 2 and full English version conditions plays a valuable role in this exploratory study, which can inform designs for future larger-scale studies. Sample Following approval from the human subjects review board at Rochester Institute of Technology's National Technical Institute for the Deaf (RIT/NTID), participants were recruited from high schools and colleges in western New York State. To be eligible for participation, students needed to be (a) in high school or post-high school, (b) self-identified as Deaf or hard of hearing (not hearing), (c) self-reported as users of both ASL and written English, and (d) recipients of a grade of “C” or better in their first-semester algebra course. (Four candidates were not eligible due to lower algebra grades.) Researchers sought an approximate 50–50 split between high school and post-high school participants. Participants received gift cards for their participation. The final sample included 12 individuals at the high school level and 19 at the post-high school level. There were 18 males and 13 females; 23 characterized themselves as Deaf and 8 as hard of hearing. The sample had a mean age of 20.35 years (SD = 4.47). For self-report of race/ethnicity, 21 were White (not Hispanic), 3 were Hispanic, 2 were Black or African American, 2 were Asian or Pacific Islander, and 3 categorized themselves as “Other.” Regarding the individual’s first language, 21 participants chose ASL or other signing modality (e.g., “sign language,” “ASL and PSE”) and the remaining 10 individuals chose English or another spoken language. The mean age for first exposure to ASL was 5.27 years (SD = 6.08). Participants generally reported good grades in the first semester of high school algebra: 15 indicated “A,” 13 indicated “B” (including 1 “B+,” 1 “A or B,” and 1 “B or above”), and 3 indicated “C.” As shown in Table 2, by and large the participants considered themselves capable in both ASL and English, as indicated by their levels of agreement with literacy questions, though participants indicated the least agreement with the statement: “It is easy for native English users to understand my English writing.” Table 2 Self-report of ASL and English proficiency Question SD D N A SA 1. It is easy for Deaf signers to understand my ASL 0 0 1 9 21 2. It is easy for me to understand native Deaf ASL signers 0 1 3 10 17 3. It is easy for native English users to understand my English writing 0 1 10 13a 7 4. It is easy for me to read and understand written English 0 0 0 13 18 Question SD D N A SA 1. It is easy for Deaf signers to understand my ASL 0 0 1 9 21 2. It is easy for me to understand native Deaf ASL signers 0 1 3 10 17 3. It is easy for native English users to understand my English writing 0 1 10 13a 7 4. It is easy for me to read and understand written English 0 0 0 13 18 Note: SD = strongly disagree; D = disagree; N = neither agree nor disagree; A = agree, SA = strongly agree. aOne individual responded both “agree” and “strongly agree.” This was categorized as “agree.” Table 2 Self-report of ASL and English proficiency Question SD D N A SA 1. It is easy for Deaf signers to understand my ASL 0 0 1 9 21 2. It is easy for me to understand native Deaf ASL signers 0 1 3 10 17 3. It is easy for native English users to understand my English writing 0 1 10 13a 7 4. It is easy for me to read and understand written English 0 0 0 13 18 Question SD D N A SA 1. It is easy for Deaf signers to understand my ASL 0 0 1 9 21 2. It is easy for me to understand native Deaf ASL signers 0 1 3 10 17 3. It is easy for native English users to understand my English writing 0 1 10 13a 7 4. It is easy for me to read and understand written English 0 0 0 13 18 Note: SD = strongly disagree; D = disagree; N = neither agree nor disagree; A = agree, SA = strongly agree. aOne individual responded both “agree” and “strongly agree.” This was categorized as “agree.” Materials Items This section describes the math items, including their mathematics content, and how the items were developed and delivered. The items were used to obtain the math score (one for each of the three conditions) and the translation score (one for each of two signed conditions). Mathematics content The 10 mathematics items were selected from a set of items that had been used as practice materials for an existing standardized mathematics test. The test addressed pre-college mathematics and included content that would be encountered in typical middle school or early high school mathematics classes. All 10 items had a multiple-choice response format with five answer options. Key criteria for selecting items were that half (i.e., five) be judged by two experts in education and ASL, one of whom also has expertise in mathematics education, as “easy” to translate from English into ASL, and that the others be judged as “hard” to translate. Three of the items had graphics. Creation of the ASL videos of the signed versions In developing the two signed versions, the intent was to provide translations into ASL that were as comparable as possible to those of native ASL signers. However, each of these versions differs in some respects from native ASL. Therefore, we will refer to the avatar version as “digital ASL,” to indicate that it was produced electronically (i.e., using a computer-based authoring environment), and the human version as “L2 ASL” to indicate that the signer (an RID-certified interpreter) had learned ASL as a second language. Only the English-heavy portions of the item stems were interpreted into ASL. Figures, which were present in two of the 10 items, occurred only as stimuli (rather than as part of the item stem or response options) and were not interpreted, nor were response options. The rationale behind this decision includes (1) that a major factor in D/HH individuals’ challenges in understanding mathematics is the English components, which on this mathematics test are primarily in the stems; (2) that mathematics expressions are in themselves English-free, and (3) that English is ordinarily used sparingly in graphics (figures) within mathematics items, primarily for labels (e.g., “angle B”). It was assumed that D/HH individuals would generally experience considerably less difficulty with such labels than with more complicated connected text. In addition, response options are often single words, letters, numbers, or brief mathematical expressions; so often they too present less English complexity than stems, which may involve more complicated grammatical structure. To minimize screen clutter, researchers also decided not to include graphics and written mathematical expressions in the ASL delivery because this content would already be present as on-screen text, along with the response options. Prior to the recording session, researchers and ASL expert consultants conferred with mathematics test developers to ensure that translations were not only linguistically correct but also maintained construct validity. Procedure for creating the human version An RID-certified sign language interpreter with over 10 years’ experience interpreting for D/HH students in K-12 schools served as the human signer. Developing the human version, created before the avatar version, involved an iterative process: the expert consultants (a Deaf ASL/English bilingual researcher with experience translating high-stakes test items into ASL, and a hearing researcher with superior ASL fluency and expertise in mathematics education, with assistance from a member of the research team with experience in both test development and ASL linguistics research) agreed on a translation and worked with the signer to rehearse and refine it before recording. For some items, one of these consultants recorded rough self-videos as models for the interpreter. The team provided the interpreter with an enlarged printout of an English-word glossing of the ASL translation to use as a memory aid while signing. The videos were recorded to show the typical signing space, that is, head to hips, excluding legs. After each recording of each item, the team, including the signer, watched and critiqued the video. The signer rehearsed and then recorded the translation again, if necessary. This process was repeated until the team was satisfied; then they moved on to the next item. This production process occurred over a two-day period. Procedure for creating the avatar version The avatar version was created by a company specializing in signing avatars (Vcom3D).2 Resources provided to Vcom3D included the English items and some general instruction (e.g., use a white male avatar to match the race/ethnicity and gender of the human signer – the intention being to eliminate race/ethnicity as a confounding variable in the study); the same English-word glosses and videos provided to the human signer; some minimal direct question-and-answer with the researchers; and feedback from researcher review. Vcom3D was never given access to the videos of the human signer so as not to provide a resource that would be unavailable in an authentic situation in which avatar signers (instead of human signers) might be used in test development. On average, each item underwent about 2 1/2 review cycles before completion. This process occurred over several weeks. Every revision was based on a review by the same three-person team that worked on the human versions. Delivery of the items and recording of responses As mentioned earlier, the two signed conditions were delivered on computer (specifically, a 15-inch high-resolution Apple Mac laptop) and the non-signed condition (full English version) was delivered on paper. Responses to all items were recorded on paper. The participant also recorded the translation of the item into English on the item response sheet. Participants provided two translations – one for each of the two signed conditions. Participants were able to pause and resume the video as well as use a slider to navigate to any portion of the video. Accompanying the on-screen presentation were a test booklet and English version forms. Test form booklets The test form booklet (for either form A or form B) used by each participant consisted of 10 pieces of 8 1/2 by 11-inch paper printed double-sided, with two printed sides (item response sheets) per item – one for the first signed condition (video 1) and one for the second signed condition (video 2). Each item response sheet showed the partial English version of the item (i.e., appropriately grayed out, just as it was presented on screen for videos 1 and 2). The item response sheets were used to capture participants’ answers to the mathematics questions and their translations of items into English for each of the two signed conditions. English version sheet Participants’ answers to the mathematics questions for the full English version were collected on the English version forms. This consisted of one 8 1/2 by 11-inch sheet for each item. This sheet was essentially the same as the item response sheet, but showed the full English version rather than the partial English (i.e., partially grayed out) version. The purpose was to capture students’ responses to the mathematics items for the non-signed (full English version) condition. Other instruments Other instruments included the background interview (e.g., age, ethnicity, hearing status, self-reports of ASL and English proficiency, mathematics background), post-video questions (e.g., ease of understanding the signing), post-pair questions (e.g., which signed version was easiest to understand), post-English version form, and post-item-set questions (PISQ). Session Procedure The procedure for each participant was as follows: Proctor assigned participant to form A or B and gave instructions to the participant. For each of 10 items, the participant: viewed video 1, answered the mathematics item (by marking the response sheet), translated the item into English (on response sheet), and then answered the post-video questions; viewed video 2, answered the item again, translated it again into English, and answered post-video interview questions for video 2; answered the post-pair interview questions; answered the item again with access to the full English version, and answered the post-English-version interview questions. After completing all 10 items, the participant answered the post-item-set questions (PISQ). Each session lasted about 90 min. All participants communicated with the proctor in ASL. Data Analysis While the number of participants was relatively small (n = 31), a number of statistical analyses were performed. For each of the 31 students there were nine questions analyzed (item 6 was excluded from the 10 originally administered).3 Of the nine items that we analyzed, two items, 4 and 7, used graphics. One item used a mathematics expression in the stem. As mentioned earlier, we often used data from video 1 (rather than either video 2 or a union (pooling) of data from video 1 and video 2). This is because we considered that an initial exposure to an ASL version (video 1) best reflected the use of ASL in an actual testing situation (i.e., the test taker answers the item without already having seen a different signed presentation of the item). Scoring of participant translations The translation score was determined by two researchers with superior ASL fluency and expertise in mathematics education. The two raters each assigned each translation (ASL to English) a score on a scale of 0–5, with 0 representing no response and 5 representing a near-perfect translation. (Initially, one rater inadvertently failed to rate 5 of the translations.) Researchers agreed that generally, if the raters’ translation scores differed by one point, this would be considered acceptable, indicating near-agreement on the accuracy of the translation. Any discrepancy of more than one point between the two raters’ scores or between scores of 2 (unacceptable translation) and 3 (minimally acceptable translation) was addressed and reconciled in a single two-hour meeting, facilitated by the ASL-knowledgeable member of the research team. At the beginning of the session, the five missing ratings were supplied by that rater (who, unfortunately, had already received exposure to the other rater’s scores). Then the discrepancies were reconciled; in every case, the raters quickly reached a consensus score. The final score used in later analysis consisted of (a) the agreed-upon score where there was exact agreement, (b) the consensus score resulting from the reconciliations, or (c) the higher of the two values where the scores did not require reconciliation. Reliability of scoring of translations. Translation raters 1 and 2 showed considerable agreement in their evaluations of the quality of participants’ ASL-to-English translations. The translation score of rater 1 exactly matched that of rater 2 58.6% of the time. The translation score of rater 1 was adjacent to or exactly the same 90.6% of the time. The kappa (unweighted kappa value) was .46 (standard error: .026, p value <.001). The five translations for which ratings were not initially offered by one rater were omitted from agreement calculations, thereby yielding a total of 553 (instead of 558) translations for agreement statistics (with two raters per response). Coding of levels of agreement For the purposes of analysis, the levels of agreement in Likert scales were converted to numbers as follows: strongly disagree (SD) = 1, disagree (D) = 2, neither agree nor disagree (N) = 3, agree (A) = 4, strongly agree (SA) = 5. Tabulation and statistical significance Results were tabulated, and in some cases statistical tests were run. Unless otherwise stated, we used an alpha level of .05 for all statistical tests. Basic description of participants All 31 of the participants answered all nine items. Specifically, each of the 31 participants responded to each mathematics item 3 times, once for each of the two videos and then once for the full English version. Each of the 31 participants made two translations from ASL, once for video 1 and another for video 2. Of the 31 participants, 16 were assigned to form A and 15 to form B. For the highest level of mathematics taken, following are the numbers of participants (based on self-report): 3 at level 1 (algebra 1), 3 at level 2 (geometry), 11 at level 3 (algebra 2 and trigonometry), 10 at level 4 (pre-calculus), 4 at level 5 (calculus). Form effect There was not a significant form effect with respect to mathematics score (for video 1), based on a t-test for equality of means for form A (M = 3.31, SD = 1.92) and form B (M = 4.27, SD = 2.09), t(29) = 1.32, p = .195, two-tailed. Different aggregations of math scores In the analyses conducted, math scores (as well as translation scores) were aggregated in various ways. For example for a given participant, item, and condition, the math score is simply a 1 (correct) or a 0 (incorrect). A math score for an individual across the 9 items will depend, for example, on whether it is summed by video (e.g., all scores for video 1), by signed condition (e.g., all scores for the avatar-signed condition), or by full English version (non-signed condition). Obviously, math scores for groups of participants can be based on any of these scores for individuals. Completeness of key data For 31 participants and nine items, we obtained three math scores (video 1, video 2, and full English version) and two translation scores (video 1 and video 2). Results Key findings regarding the research questions were as follows. 1. Which Signed Version is Most Comprehensible to Deaf participants? Two key indicators of comprehensibility of the signing are (a) relationship between mathematics score and signed version and (b) reported quality of signing. Relationship between mathematics score and signed version For video 1, there was no significant relationship between math score (1 or 0) and signed version (avatar versus human), chi-square(1, N = 279) = .030, p = .863. Note that this statistic is based on the math scores of 31 participants for each of the 9 items. Results were similarly not significant for both video 2, chi-square(1, N = 279) = .430, p = .512, and for the union of data from video 1 and video 2, chi-square(1, N = 558) = .117, p = .733. Thus, notwithstanding the higher perception of quality for the human version (paragraph above) no significant difference in math score based on signed version was detected. This lack of significant difference could have been due to the small sample size. Reported quality of signing Participants tended to rate the quality of the signing higher for the human version than for the avatar version. Specifically, 61.3% of participants responded “very good” or “excellent” to the question of quality of human translation, while for the same question for avatar, only 9.7% of participants responded “very good” or “excellent.” (Options were excellent, very good, good, fair, and poor.) 2. To What Degree do Deaf Participants Prefer a Human or Avatar Signed Version of Mathematics Test Questions? Preference was addressed by two questions. First, in response to the question, “Which kind of signing did you prefer for mathematics items – avatar or human? (avatar, human, no preference),” 29 of 31 participants preferred human, 1 preferred avatar, and 1 indicated “no preference.” Second, participants also indicated greater agreement with a recommendation of the human version over the avatar version for mathematics tests. Specifically, in response to the statement, “I would recommend the human version in mathematics tests for high school or higher education students who are deaf or hard of hearing” (emphasis added), 26 participants (83.9%) agreed or strongly agreed. On the other hand, for the same question regarding the avatar version, only five participants (16.1%) agreed or strongly agreed. 3. What Kinds of Usability Challenges do Deaf Participants Experience When Using the Avatar and Human Versions? Avatar The main challenge identified for the avatar versions was the relative lack of facial expression and body language (e.g., “There are no facial expression and mouth doesn’t mouth the words4”). Besides indicating affect, these features are essential to ASL grammar. Although the facial expressions and body positions of ASL were present to some degree in the avatar versions, they were by no means as fully developed or fluent as in human signing. Some participants mentioned not being familiar with signing avatars and so found them difficult to understand (e.g., “Usually see human signing – not avatar – so I was not used to translating avatar’s ASL to English”). Human Participants noted few challenges specific to the human versions, apart from one person’s observation that the human signer should pause more frequently. Human and avatar Some challenges were noted as common to both avatar and human versions. These included: some vocabulary (perhaps the mathematics-related vocabulary, though this was not specified) was unfamiliar; and some signs were difficult to translate into English. (Some bilingual signers with limited translation experience may not know which English word to use to translate a given ASL sign, even if they know both the sign and a good English equivalent word or phrase.) Check on visual appearance Participant responses suggest that a large majority of participants found the computer- and paper-based materials visually adequate. A clear majority of participants responded “yes” to the following questions: “Were the videos easy to see (big enough and sharp enough image)?” (30 “yes” responses); “Were the background colors for the avatar and human acceptable?” (29 “yes” responses); “Was the lighting for the avatar and human acceptable?” (31 “yes” responses); “Were the English items on computer easy to see?” (26 “yes” responses5); and “Were the English items on paper easy to see?” (31 “yes” responses). 4. What Suggestions do Deaf Participants Make for Improving the Usability of Signed Versions? Participants made several suggestions for improving the signed versions (in the post-item set, question 19). For the avatar version, key suggestions centered on issues such as facial expressions and mouthing, overall fluency/fluidity of signing, and fingerspelling and signing of numbers. For the human version, most participants were fairly satisfied, apart from the general issues that applied to both avatar and human versions. Among the suggestions relevant for both avatar and human are: (1) Use human, not avatar, signing. Or: do not use video at all, but instead a live signer. This reflects the clear preference for human signer over avatar. (2) Caption all videos, or have a full English version available on paper or elsewhere. Many participants felt strongly that they, or other D/HH individuals, would have benefited from seeing full English versions along with ASL. (This, as noted, was not possible due to design considerations: displaying English along with ASL would have made it impossible to evaluate the ASL’s usefulness, as distinct from the usefulness of the English text.) Of course, in actual use, if videos of ASL were used for test content, the test taker would have access to the English version of the test as well. (3) Use more mouth movement and facial expression. (4) Sign English, not ASL. One of the research consultants reported that a number of the participants were unfamiliar with mathematics content presented in ASL. Participants had apparently learned mathematics from written (or on-screen) English, along with graphics, and/or from signed instruction in a sign system that uses ASL signs in roughly English word order. For these individuals, any potential advantage of the ASL versions may have been neutralized by their unfamiliarity with the presentation of mathematics content in ASL. 5. Is There an Association Between the Reported Age of Starting to Learn ASL and Either Mathematics Score or Translation Score? One-way analyses of variance were computed and did not indicate a significant association between participants’ reported age of starting to learn ASL and either their total mathematics score for video 1, F(4, 25) = .75, p = .566, or their total translation score, F(4, 26) = .85, p = .507. 6. Is There Evidence that Participants Benefited From a Second Signed Version or from the Full English Version? Participants’ total mathematics scores for video 1 (M = 3.77, SD = 2.03) and video 2 (M = 4.10, SD = 1.96) were not significantly different from each other based on an analysis of variance (ANOVA); t(30) = 1.54, p = .134, two tailed. However, the participants’ mathematics scores for the full English version (M = 4.61, SD = 2.43) were significantly different from both video 1 mathematics scores; t(30) = 3.14, p = .004, two tailed, and from video 2 mathematics scores; t(30) = 2.15, p = .040, two-tailed. Thus, we did not detect a significant difference (e.g., growth) between video 1 and video 2, yet exposure to the full English version did appear to boost mathematics scores. Thus, the key finding is that, as expected, participants performed better in the non-signed condition (full English version) (having already seen the earlier signed conditions) than they performed in those signed conditions (video 1 and video 2). Although there was no significant difference between performance on video 1 and video 2, the observed difference was in the expected direction, with video 2 being higher than video 1.1 Analysis by item, showing the number of participants answering correctly As shown in Table 3, for the most part, the number of people answering a math item correctly in the non-signed condition (full English version) was equal to or higher than the number answering correctly in the video 1 and video 2 conditions. (This is indicated by “No” in the last row of Table 3.) This was expected because the non-signed condition allowed access to the full English item after having already seen both the human and avatar videos. As may be seen in Table 3, this did not occur for items 1 and 7 (see the “Yes” in the last row). We do not know why, for items 1 and 7, the number of people answering correctly for the full English version was lower than for the signed conditions (video 1 and video 2). It may have been due to random noise. Table 3 Number of people answering the item correctly for Video 1, Video 2, and the full English version Condition Item 1 2 3 4 5 7 8 9 10 Video 1 18 10 14 19 16 8 6 11 15 Video 2 17 12 14 20 15 9 13 13 14 Full English version 16 16 17 21 16 6 15 20 16 Number for full English version is less than the number for both signed versions (video 1, video 2) Yes No No No No Yes No No No Condition Item 1 2 3 4 5 7 8 9 10 Video 1 18 10 14 19 16 8 6 11 15 Video 2 17 12 14 20 15 9 13 13 14 Full English version 16 16 17 21 16 6 15 20 16 Number for full English version is less than the number for both signed versions (video 1, video 2) Yes No No No No Yes No No No Note: As mentioned earlier, item 6 was not used for the analysis. Table 3 Number of people answering the item correctly for Video 1, Video 2, and the full English version Condition Item 1 2 3 4 5 7 8 9 10 Video 1 18 10 14 19 16 8 6 11 15 Video 2 17 12 14 20 15 9 13 13 14 Full English version 16 16 17 21 16 6 15 20 16 Number for full English version is less than the number for both signed versions (video 1, video 2) Yes No No No No Yes No No No Condition Item 1 2 3 4 5 7 8 9 10 Video 1 18 10 14 19 16 8 6 11 15 Video 2 17 12 14 20 15 9 13 13 14 Full English version 16 16 17 21 16 6 15 20 16 Number for full English version is less than the number for both signed versions (video 1, video 2) Yes No No No No Yes No No No Note: As mentioned earlier, item 6 was not used for the analysis. Also as shown in Table 3, item 4 was the easiest item (the most people answering it correctly), while item 7 was the most difficult item (the least number of people answering it correctly). Item 7 required the student to identify from a list of choices the fraction that was halfway between two other given fractions. 7. What is the Relationship Between Mathematics Score and Translation Score? Participants’ total mathematics scores (out of nine possible points) for video 1 (M = 3.77, SD = 2.03) were strongly correlated with their total translation scores (of 45 possible points [9 items times 5 possible points per translation]) (M = 21.71, SD = 6.02), r = .655, p < .001. 8. What is the Relationship Between Translation Score and Signed Version? For video 1, there was a slight association between translation score and signed version, chi-square(5, N = 310) = 7.92, p = .161. There was no significant association between translation score and signed version for either video 2, chi-square(5, N = 310) = 1.44, p = .920, or for the union of data from video 1 and video 2, chi-square(5, N = 558) = 2.85, p = .724. 9. Are there Indications About the Adequacy of the Full-English Version by Itself? The value of the English version by itself was not directly evaluated in this project. Direct evaluation might have required examining participant performance with a full English version, both with and without exposure to a signed version. (By contrast, in the present study, participants saw the full English version of an item only after having been exposed to both signed versions.) Nevertheless, 93.2% either agreed or strongly agreed with the statement, “The English version alone (without any signing) would be enough for me to answer this item correctly” (SD = 1, D = 2, N = 3, A = 4, SA = 5). 10. Is there a Relationship Between Highest Mathematics Course Taken and Total Mathematics Score? A one-way analysis of variance (ANOVA) was used to test for differences in total mathematics score (for nine items) for video 1 among the levels of participants’ highest mathematics course taken (1 = algebra 1; 2 = geometry l; 3 = algebra II/trigonometry; 4 = pre-calculus; 5 = calculus). We found no significant difference in participants’ total mathematics scores for video 1 based on the participants’ highest mathematics courses taken, F(4, 26) = .79, p = .540. The number of participants for some levels was small (n = 3 for 2 levels and n = 4 for one level) and therefore it is not possible to attach much practical significance to this finding. Discussion The expectation that there would be differences in the usability between avatar and human signed versions of the mathematics items was only partly supported by this research study. Participants showed a clear-cut preference for human signing over avatar versions, as was observed by Russell et al. (2009). Participants believed that the quality of signing in the human version was higher than that of the avatar version. This study also appears consistent with findings by Russell et al. (2009) that signed version (human versus avatar) was not significantly associated with mathematics score. Furthermore, in this study, signed version was not significantly associated with translation score. Thus, neither mathematics scores nor translation scores provided significant evidence that one version was more comprehensible than the other. Mathematics score and translation score were strongly correlated (see research question 1). It is difficult to say which way the causation runs; is mathematics content knowledge (as well as language knowledge) required to produce an accurate translation, or is language comprehension prerequisite to understanding the mathematics content as presented? Presumably, each of these plays a role. One of the researchers observed that those participants who struggled to translate the signed versions into English did not appear to grasp the underlying mathematics content very well. This is typical of translation situations: understanding the content is crucial to a good translation. But the converse is likely true as well, that understanding the language in which content is presented is essential to understanding that content. That is, the students who use ASL themselves socially, but whose mathematics education has been in English, may experience difficulty in understanding even familiar mathematics content when it is presented in ASL. An explanation for such a difficulty could be informed by studies that point to the strength of the language of instruction for bilingual students and adults (Ardila & Rosselli, 2017; Martinez-Lincoln, Cortinas, & Wicha, 2015). These studies suggest that learners tend to revert to the language in which they learn mathematics regardless of whether that language is their native language when solving problems or recalling mathematics concepts. Although this can be reversed with high proficiency in the second language, it may be the case here that students resorted to English, the language in which they first learned the needed mathematics concepts. This would also support the previous hypothesis that the participants did not have high-level cognitive/academic language proficiency (CALP; Baker, 2006) in ASL that would have allowed them access to the problems. An alternate explanation might be that the interpreter’s signing was not clear to the student, and therefore created difficulties in translating the item from ASL to English. Future studies may benefit from cognitive debriefing the draft item translations with student users in addition to the research team prior to finalizing the ASL translations. The lack of differences in the comprehensibility (as indicated by mathematics scores and translation scores) between avatar and human versions, despite clear discrepancy in satisfaction with these versions, may be due to any of several reasons. For example, it may be due to small sample size. Also, it may be due to less familiarity with mathematics presented in ASL, which, of course, raises the issue of the need for such translations. However, we would argue that the need for such translations stems from the low achievement of these students. If presentation and instruction in English is not affording students success in mathematics, perhaps students should be given the opportunity to learn mathematics through ASL with accurate and appropriate signs. This could yield a greater understanding of mathematics as presented in ASL, leading to greater achievement. Despite some movement in this area, for most students, this has yet to be a reality. Another part of the explanation may be due to a trend that is consistent with current phenomena in deaf education and educational interpreting. Many D/HH students have become accustomed to working with educational interpreters and deaf education teachers with a wide range of signing proficiency (Schick, Williams, & Kupermintz, 2006). When the interpreter or teacher is not proficient in ASL, ASL-signing students must try their best to understand signing that may be semantically unclear and that may differ from ASL in significant ways, including the absence or misuse of ASL grammatical markers conveyed by gaze, face, and body. Since the model signer in our study was a K-12 certified interpreter with extensive experience in mainstreamed school settings, it is possible that some of our participants, particularly those who are accustomed to Deaf teachers’ ASL signing style, had difficulty comprehending this interpreter’s signing style. Schick, Williams, and Kupermintz (2006) find significant variation in the types of signing used, and the interpreting skills, of educational interpreters. If D/HH students’ instructional language for mathematics has been ASL (or some other form of signing), it is highly likely that despite our efforts to use accurate ASL, the signing may have been quite different from what students have seen in instructional settings. Another consideration that may have influenced results of the study is participants’ mathematics proficiency levels. Note that participants’ total mathematics scores for video 1 had a mean of 3.77 (SD = 2.3) out of 9 possible points (1 point per item), which is arguably low, given that the items addressed pre-college mathematics and included content that would be encountered in typical middle school or early high school mathematics classes. The current study used math score as an indicator of the usability (e.g., comprehensibility) of the given conditions rather than an indicator of mathematics proficiency. However, the relatively low scores raise the issues of whether the underlying mathematics proficiency of participants was in fact low relative to the difficulty of the items and that results might be different if there were a closer match between the mathematics difficulty of the items and the mathematics proficiency of the participants. Though not a major focus of the current study, it is worth noting that given the current state of the art, producing a good avatar version may not be significantly less expensive than a good human version. Both versions involved multiple cycles of revision. Much of the savings in using avatar versions would likely accrue where corrections need to be made. Revising the avatar version can be done on computer, but revisions to a human version could involve substantial costs for actors and video production staff. It appears that the key challenge identified for the avatar versions was the relative lack of facial expression and body language. These features are essential to ASL grammar, and the facial expressions and body positions of ASL were not nearly as fully developed or fluent as in human signing. Furthermore, gaze direction, which is crucial to ASL grammar and discourse, was not used meaningfully in the avatar versions. None of the participants mentioned gaze specifically, but they may have thought of it as a component of facial expression. Limitations The study had several limitations. First, although the goal was to use ASL, the digital ASL (avatar) and L2 ASL (human) signers did not make full use of certain non-manual aspects of ASL. The human interpreter’s non-native ASL acquisition could have resulted in incomplete acquisition of these components of the language. Additionally, the relatively small sample size (only 31 participants) and the small number and variety of items (e.g., nine items, all multiple-choice, only pre-college mathematics) limit the interpretation of some of the study results. Also, there was a lack of a counterbalanced design (e.g., the full English version always occurred last), which made it impossible to cancel possible order effects. A limitation that may have worked against acceptance of the avatar was participants’ lack of familiarity with avatar signing. Nor did the study compare a version in which mathematics expressions were signed and graphics described in ASL with a version such as the one presented here, in which only the English-heavy components of the item were signed. Finally, five of the 558 ratings were obtained (as noted above) under less-than-ideal conditions (i.e., one rater having already received exposure to the other rater’s scores). It should be kept in mind that the goal of the current project was to compare the human and avatar versions rather than to evaluate the usefulness of an ASL translation (versus no translation). Suggestions for Future Research Additional research is needed to address remaining issues. As noted earlier, there is a need to determine whether signed presentation of the test content has an effect, positive or negative, on mathematics performance. This would involve evaluating comprehensibility of mathematics items with and without ASL videos. Such a study should obviously oblige participants to rely on their English proficiency (without first have been exposed to a signed version, as in the current study). There is also a need to find ways to improve the avatar’s expression (face [including gaze], mouth, body), the clarity of fingerspelling and numbers, and the fluidity of the signing. Furthermore, best practices need to be established for signing figures, mathematics expressions, and response options. If the items are usable (and if other validity criteria are met) without signing these elements, this would suggest that signing these non-English portions of mathematics items may not be necessary. To ensure that, to the greatest extent possible, the signed version is strongly ASL, it is suggested that future studies similar to the present one involve a Deaf, certified deaf interpreter (CDI) as the human interpreter and/or a certified educational interpreter, who would have content knowledge as well as an understanding of how to interpret academic concepts (Hutter & Pagliaro, 2016; Schick & Williams, 2004). Significance This study was intended to take important initial steps toward informing testing organizations about the usability of ASL videos (human and avatar) in the context of mathematics assessments. The study gains importance in its finding of no significant association between signed version and either mathematics score or translation score. It goes beyond earlier studies in two key ways. First, it obliged the participants to rely on signed versions, therefore allowing better inferences about the impact of the signing itself, rather than letting the signing’s effect be confounded with the impact of participants’ understanding of English text. Second, by using translation score, this study provided an additional indicator (beyond mathematics score) of participant comprehension of the signed versions. Additionally, the study provides a useful snapshot of the current state of avatar technology. These initial results suggest that improvements to avatar videos are needed before their user acceptance would equal that of human videos. Notes 1 Registry of Interpreters for the Deaf. 2 Clymer, Geigel, Behm, and Masters (2012) cite Vcom3D as one of several organizations having a state-of-the-art signing avatar system. 3 Through analysis of participants’ translations from ASL into English, it was found that for one item, a significant piece of information was missing from both human and avatar versions. Because of this missing information, data from this item were not included in the results that follow. The removed item had been categorized as easy to translate and used a graphic. Thus, the final total number of items was nine. The report continues to refer to items by their original numbers 1 through 10, but with item 6 not included in the analysis. 4 Participants’ comments were signed; the proctor translated and transcribed them into English. 5 Among the five other responses, for two individuals responses were not obtained and three gave “no” responses. For two of the “no” responses, the proctor captured participant comments suggesting that these participants may have misunderstood the question. Conflicts of Interest No conflicts of interest were reported. Acknowledgments We gratefully acknowledge: Nick Sferra for video production and post-production for the human version; Jason Hurdich of Vcom3D for authoring the avatar versions; Thomas Florek for developing the delivery system; Emily Werfel of Rochester Institute of Technology (RIT) for data collection and preparation; Nan Kong for data analysis; Kitty Sheehan for coordinating travel and payments; and Heather Buzick and Liz Stone for preliminary review. This work was supported by Educational Testing Service (ETS). References Ardila , A. , & Rosselli , M. ( 2017 ). Inner speech in bilinguals: The example of calculation abilities. In Ardila A. , Cieslicka A. B. , Heredia R. R. , & Rosselli M. (Eds.) , Psychology of bilingualism: The cognitive and emotional world of bilinguals (pp. 27 – 37 ). Cham, Switzerland : Springer International Publishing . doi:10.1007/978-3-319-64099-0_2 Baker , C. ( 2006 ). Foundations of bilingual education and bilingualism ( 4th Ed. ). Clevedon, England; Buffalo, N.Y. : Multilingual Matters . Cawthon , S. W. , Winton , S. M. , Garberoglio , C. L. , & Gobble , M. E. ( 2011 ). The effects of American Sign Language as an assessment accommodation for students who are deaf or hard of hearing . Journal of Deaf Studies and Deaf Education , 16 , 198 – 211 . doi:10.1093/deafed/enq053 Google Scholar CrossRef Search ADS PubMed Clymer , E. , Geigel , J. , Behm , G. , & Masters , K. ( 2012 ). Use of Signing Avatars to Enhance Direct Communication Support for Deaf and Hard-of-Hearing Users. National Technical Institute for the Deaf (NTID), Rochester Institute of Technology. Retrieved from http://www.ntid.rit.edu/sites/default/files/cat/NTID-SigningAvatar_20Mar2012_Final.pdf Higgins , J. A. , Famularo , L. , Cawthon , S. W. , Kurz , C. A. , Reis , J. E. , & Moers , L. M. ( 2016 ). Development of American Sign Language Guidelines for K-12 Academic Assessments . Journal of Deaf Studies and Deaf Education , 21 , 383 – 393 . doi:10.1093/deafed/enw051 Google Scholar CrossRef Search ADS PubMed Hutter , K. , & Pagliaro , C. ( 2016 ). Is the interpreter in your child’s education an educational interpreter? The Endeavor , Fall 2016 , 29 – 33 . Retrieved from https://issuu.com/asdc/docs/asdcfall2016v2hires Johnson , E. , Kimball , K. , & Brown , S. O. ( 2001 ). American Sign Language as an accommodation during standards-based assessments . Assessment for Effective Intervention , 26 , 39 – 47 . doi:10.1177/073724770102600207 Google Scholar CrossRef Search ADS Liddell , S. K. , & Johnson , R. E. ( 1989 ). American Sign Language: The phonological base . Sign Language Studies , 64 , 197 – 277 . doi:10.1353/sls.1989.0027 Martinez-Lincoln , A. , Cortinas , C. , & Wicha , N. Y. ( 2015 ). Arithmetic memory networks established in childhood are changed by experience in adulthood . Neuroscience Letters , 584 , 325 – 330 . doi:10.1016/j.neulet.2014.11.010 Google Scholar CrossRef Search ADS PubMed Mason , T. C. ( 2005 ). Cross-cultural instrument translation: Assessment, translation, and statistical applications . American Annals of the Deaf , 150 , 67 – 72 . doi:10.1353/aad.2005.0020 Google Scholar CrossRef Search ADS PubMed Partnership for Assessment of Readiness for College and Careers . ( 2017 ). PARCC accessibility features and accommodations manual (6th Ed.). Retrieved from avocet.pearson.com/PARCC/Documents/GetFile?documentId=4900 Russell , M. , Kavanaugh , M. , Masters , J. , Higgins , J. , & Hoffmann , T. ( 2009 ). Computer-based signing accommodations: Comparing a recorded human with an avatar . Journal of Applied Testing Technology , 10 . https://atpu.memberclicks.net/assets/documents/computer_based.pdf Schick , B. , & Williams , K. ( 2004 ). The educational interpreter performance assessment: Current structure and practices . In Winston E. A. (Ed.) , Educational interpreting: How it can succeed (pp. 186 – 205 ). Washington, DC : Gallaudet University Press . Schick , K. , Williams , K. , & Kupermintz , H. ( 2006 ). Look who’s being left behind: Educational interpreters and access to education for deaf and hard-of-hearing students . Journal of Deaf Studies and Deaf Education , 11 , 3 – 20 . doi:10.1093/deafed/enj007 Google Scholar CrossRef Search ADS PubMed Smarter Balanced Assessment Consortium . ( 2017 ). Usability, accessibility, and accommodations guidelines. Retrieved from https://portal.smarterbalanced.org/library/en/usability-accessibility-and-accommodations-guidelines.pdf Stokoe , W. C. ( 1960 ). Sign language structure: An outline of the visual communication systems of the American deaf. Studies in Linguistics: Occasional Papers (No. 8). Buffalo, NY: Department of Anthropology and Linguistics, University of Buffalo. Stokoe , W. C. , Casterline , D. , & Croneberg , C. ( 1965 ). A dictionary of American Sign Language on linguistic principles . Silver Spring, MD : Linstok Press . © The Author(s) 2018. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com. This article is published and distributed under the terms of the Oxford University Press, Standard Journals Publication Model (https://academic.oup.com/journals/pages/about_us/legal/notices) http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png The Journal of Deaf Studies and Deaf Education Oxford University Press

Usability of American Sign Language Videos for Presenting Mathematics Assessment Content

Loading next page...
 
/lp/ou_press/usability-of-american-sign-language-videos-for-presenting-mathematics-sGxOmBeK6B
Publisher
Oxford University Press
Copyright
© The Author(s) 2018. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com.
ISSN
1081-4159
eISSN
1465-7325
D.O.I.
10.1093/deafed/eny008
Publisher site
See Article on Publisher Site

Abstract

Abstract There is considerable interest in determining whether high-quality American Sign Language videos can be used as an accommodation in tests of mathematics at both K-12 and postsecondary levels; and in learning more about the usability (e.g., comprehensibility) of ASL videos with two different types of signers – avatar (animated figure) and human. The researchers describe the results of administering each of nine pre-college mathematics items in both avatar and human versions to each of 31 Deaf participants with high school and post-high school backgrounds. This study differed from earlier studies by obliging the participants to rely on the ASL videos to answer the items. While participants preferred the human version over the avatar version (apparently due largely to the better expressiveness and fluency of the human), there was no discernible relationship between mathematics performance and signed version. Current educational assessments of mathematics in the United States are generally in written English, which can create a barrier for deaf and hard of hearing (D/HH) students, many of whom are less proficient in reading English than their hearing peers. Many D/HH students use American Sign Language (ASL) as their primary language. Delivery in ASL has the potential to improve test access for these individuals. Policies regarding the use of ASL for academic assessments vary by educational level (K-12 versus higher education), state, and academic domain (e.g., language arts versus mathematics). Where policy allows ASL test delivery, it may be feasible to present assessment content in high-quality, standardized videos. An important design choice concerns whether the signer in the videos should be human (recorded) or an avatar (animated character). Use of an avatar version may save on development costs relative to a human version, especially if corrections are needed after initial development or if multiple test forms are needed and can be adapted from existing forms. For example, it may be easier to adjust the computer-generated avatar than to reconvene actors and a video production crew to develop and iteratively revise the human version. In addition to such logistical and cost-related considerations, a choice between human signer and avatar signer should be guided by considerations that can impact score validity, notably the usability (e.g., comprehensibility) of the human and avatar videos. Research Literature and Current Practices Background on ASL ASL is a manual/visual language, encoded not in sounds but in the shape, position, movement, and orientation of the hands (Liddell & Johnson, 1989; Stokoe, 1960; Stokoe, Casterline, & Croneberg, 1965), as well as in gaze, facial expression, and body position. ASL is grammatically distinct from English or any other spoken language, having evolved spontaneously among a community of users, as have all other natural languages. Due to federal and state laws such as the Individuals with Disabilities Education Improvement Act (IDEIA) and the Elementary and Secondary Education Act (ESEA), policies about the use of ASL as an assessment accommodation are becoming more standardized across K-12 schools. Computer-delivered tests at the K-12 level for two Race to the Top Assessment (RttTA) consortia (PARCC and Smarter Balanced) offer recorded human signers in operational mathematics tests (Partnership for Assessment of Readiness for College and Careers, 2017; p. 31; Smarter Balanced Assessment Consortium, 2017, p. 18). Such accommodations may be helpful for reducing construct-irrelevant variance associated with the reading skill levels of D/HH students. Policy in Higher Education Assessments Prior to the passage of the Americans with Disabilities Act (ADA) in 1990, most D/HH students entering higher education headed to dedicated programs such as Gallaudet University or the National Technical Institute for the Deaf. As the ADA Amendments Act (ADA AA) of 2008 and other legislation expanded opportunities for individuals with disabilities to enter institutions of higher education, increasing numbers of D/HH students began enrolling in mainstream colleges, universities, and vocational programs. There is currently no uniform policy across institutions of higher education regarding ASL interpretation of assessments. The primary operative law at the postsecondary level is the ADA AA, which is not specifically focused on education, encompassing equal opportunity issues in employment and in government and commercial services as well. Disability service providers at different institutions of higher education tend to craft their own policies. These policies are loosely coordinated by listservs and through the participation of many United States campus disability service providers in the Association on Higher Education and Disability (AHEAD). AHEAD’s recommendations, however, serve as guidance and do not have the force of law. A D/HH postsecondary student requesting ASL interpretation commonly receives this accommodation for instructional purposes, subject to factors such as availability, the feasibility of alternative accommodations (e.g., real-time captioning), and course or program content. Policies toward interpretation of course-based tests, however, vary tremendously from one institution to another and from course to course within a single institution. Sometimes the variation is due to differing course requirements. For example, a nurses’ training program may require students to know medical terminology in English so that they can read, and quickly respond to, entries in patients’ charts or instructions from physicians. Translating an assessment in that domain into another language would alter the construct. Other courses may be less vocabulary-based but nonetheless include content that is dependent on written language. Similarly, on high-stakes standardized tests, states and testing agencies differ in their policies about ASL interpretation of test content, which, generally speaking, is less likely to be permitted as an accommodation on graduate admissions and professional licensure tests than for postsecondary courses, due to concerns with standardization, consistency and quality of translation, and test security (Johnson, Kimball, & Brown, 2001; Mason, 2005). Two Key Studies of Computer-based Delivery of ASL in Assessment The research literature on ASL translation of tests focuses almost entirely on the K-12 level. Cawthon et al. (2011) sought to measure the effects of an ASL accommodation for D/HH students in reading and mathematics assessments. Sixty-four fifth to eighth grade students (ages 10–15) participated. Each participant was exposed to both the standard (control) and ASL (experimental) conditions. In the standard condition, the directions were translated into ASL, but the reading passages, reading test items, and mathematics test items were provided only in written English. In the ASL condition, those same materials were provided both in ASL (via DVD) and in written English. The researchers found no overall differences in the mean percent of items for which students scored correctly in either reading or mathematics, noting that “It is possible that some of the readers… did not use the ASL accommodation as intended either because they did not attend to the accommodation or because they relied more on their reading skills…” (p. 207). Thus, the study did not reveal the extent to which participants actually relied on the signing. Russell et al. (2009) compared recorded human and avatar signers in the context of a computer-based signing accommodation using released NAEP grade 8 mathematics items with 96 D/HH students in grades 8 through 12. While using the human and avatar videos, participants had access to the English items. The researchers found that on a four-option Likert scale (strongly agree, agree, disagree, strongly disagree), 60% of participants agreed or strongly agreed that “It was easy to understand information presented by the signing avatar,” while 83% of participants agreed or strongly agreed that “It was easy to understand information presented by the signing human.” Notwithstanding this preference for human over avatar, there was no significant difference in mathematics performance based on the signed version. This study did not report which aspect of the presentation (English print or computer-based signing) the student relied on to comprehend and problem-solve the mathematics items. Therefore, the lack of group differences could be due to students accessing all available information to problem-solve. A later study by Higgins et al. (2016) did show better performance in mathematics questions by students in grades 3 through 12 when supported by videos of fingerspelling and/or ASL signers, when compared to a condition without such supports. Desirable Features of Additional Research Given the paucity of research in this area, and the limitations of previous research as discussed here regarding translation issues and reliance on signed versions, as well as the increasing requests for ASL test translations, there is a need to continue exploration of the issue of avatar signing versus human signing with a study design that addresses issues that may affect either the validity or the practicality of signed versions in operational mathematics testing. Features that would be desirable in a follow-up study would include the following (all of which the study described in this paper was intended to address): Implementing a testing situation in which participants need to rely on the signing (human or avatar) rather than the English text. Russell et al. (2009) and Cawthon et al. (2011) provided access to the English item for the whole time that the student interacted with the item, which meant that their mathematics performance was not a measure of the comprehensibility of the signing itself. Participants may or may not have actually depended on the signing. A well-controlled study is needed to clarify the usefulness of signed versions in operational mathematics testing. Translating into sign language only the English-heavy features of the item, since one rationale for the signing accommodation for D/HH individuals who use ASL is to bypass the English literacy challenges many of these individuals face. For example, one might consider not signing mathematics expressions, answer options, and/or graphics, since these are largely English-free. Russell et al. (2009) and Cawthon et al. (2011) translated full items. Supplementing data on mathematics scores with other indicators of comprehension, for example, having the participant translate items into English (in a context in which they need to rely on the signed versions and have not been exposed already to the English text of the items). By having an indicator that may not be as closely bound to mathematics proficiency, one may support better inferences about the comprehensibility of the signing. Other changes are important to consider, including expanding the sample to include post-high school participants (to make inferences more relevant to higher-education settings) and ensuring the use of a human signer who is RID1-certified with experience interpreting in the educational system. (Russell et al. (2009) did not specify the human signer’s qualifications.) Purpose of this Study The purpose of the current study was to compare the usability of human and avatar videos for Deaf individuals at the high school and post-high school level using pre-college mathematics items, shedding light on some issues still in need of clarification. For purposes of this report, upper-case “Deaf” refers to a deaf or hard of hearing individual who uses ASL. In this study, researchers sought to address the research needs mentioned above, among them a research design in which participants needed to rely on the signed (human or avatar) presentation of items rather than only on the English text. It should be emphasized that the goal of the current project was not to evaluate the usefulness of an ASL translation (versus no translation), but to compare the human and avatar versions. The research questions for the study are as follows: Which signed version is most comprehensible to Deaf participants? To what degree do Deaf participants prefer a human- or avatar-signed version of mathematics test questions? What kinds of usability challenges do Deaf participants experience when using the avatar and human presentation versions? What suggestions do Deaf participants make for improving the usability of signed versions? Is there an association between the reported age of starting to learn ASL and either mathematics score or translation score? Is there evidence that participants benefited from a second signed version or from the full English version? What is the relationship between mathematics score and translation score? What is the relationship between translation score and signed version? Are there indications about the adequacy of the full-English version by itself? Is there a relationship between highest mathematics course taken and total mathematics score? Method Research Design The study was a repeated measures design in which each participant was presented with and responded to each of 10 single-selection multiple choice math items in three conditions – human-signed, avatar-signed, and non-signed. In general, the math score obtained under each condition was taken as an indicator of the usability (e.g., comprehensibility) of that condition. For each of the signed conditions (human [prerecorded] and avatar), participants were asked to translate the item into English. These translations were later rated by experts for their quality in conveying the meaning of the English item. Thus, the translation score was another potential indicator of the usability (e.g., comprehensibility) of the signed versions. Presentation and response formats There was a separate session for each participant. For the two signed conditions – human (prerecorded) and avatar – the item was presented on the computer screen. Specifically, on the right side of the screen was the “partial” English math item, that is, in which the English-heavy part of the test item was grayed out (and therefore not visible) and on the left side of the screen was a window for the video of the signer (human or avatar). This is shown in Figure 1, without the navigational controls. Figure 1 View largeDownload slide The video window (avatar) on the left and the partial English version on the right. Screen shot of the computer screen. The left side of the screen is for the signed videos (human or avatar). In this case it is the avatar. On the right side is the partial English math item. In this case the item shows a picture of a scale with an arrow pointing to a value and there are five choices from which the test taker selects one. Figure 1 View largeDownload slide The video window (avatar) on the left and the partial English version on the right. Screen shot of the computer screen. The left side of the screen is for the signed videos (human or avatar). In this case it is the avatar. On the right side is the partial English math item. In this case the item shows a picture of a scale with an arrow pointing to a value and there are five choices from which the test taker selects one. In the signed videos only the English-heavy portions of the item were signed (thereby potentially overcoming an accessibility barrier by reducing non-construct-relevant demands for knowledge of English). The non-signed condition involved presenting on paper a full English version (nothing grayed out) without any signing as shown in Figure 2. Figure 2 View largeDownload slide The full English version of the item, as it appears on paper. The item is the same one as in Figure 1 (with the arrow and the scale). Figure 2 View largeDownload slide The full English version of the item, as it appears on paper. The item is the same one as in Figure 1 (with the arrow and the scale). Note that in the signed conditions, the participant was obliged to rely on the signing video as well as the partial English version in order to correctly answer the item. For all three conditions of all items, participants recorded their answers on paper. Order of conditions The two signed conditions were always presented before the non-signed conditions (full English version). The order of the signed conditions for a given participant was specified by the form (form A or form B) to which the participant was assigned. Forms A and B Participants were assigned either to form A or form B in alternating fashion as their scheduled session with the researcher occurred (e.g., first participant to form A, second to form B, third to form A, and so on). Table 1 shows the order of human and avatar conditions for the first three items; the same pattern continued for the seven other items. As implied in Table 1, all 10 items were presented in the same order to each participant. Table 1 Order of human and avatar conditions for the first three items for Forms A and B Item number Form A Form B Video 1 Video 2 Video 1 Video 2 1 Avatar Human Human Avatar 2 Human Avatar Avatar Human 3 Avatar Human Human Avatar Item number Form A Form B Video 1 Video 2 Video 1 Video 2 1 Avatar Human Human Avatar 2 Human Avatar Avatar Human 3 Avatar Human Human Avatar Video 1 – the signed condition administered first. Video 2 – the signed conditions administered second. Table 1 Order of human and avatar conditions for the first three items for Forms A and B Item number Form A Form B Video 1 Video 2 Video 1 Video 2 1 Avatar Human Human Avatar 2 Human Avatar Avatar Human 3 Avatar Human Human Avatar Item number Form A Form B Video 1 Video 2 Video 1 Video 2 1 Avatar Human Human Avatar 2 Human Avatar Avatar Human 3 Avatar Human Human Avatar Video 1 – the signed condition administered first. Video 2 – the signed conditions administered second. Thus, in summary, each participant received each item three times in the following order: First signed condition (a.k.a. video 1) (avatar-signed condition or human-signed condition, as specified by the assigned form) Second signed condition (a.k.a. video 2) (the opposite signed condition) Non-signed condition (a.k.a. full English version) Strategy behind the inclusion of second signed condition (video 2) and the third condition (full English version) Obviously, a person’s performance in a given condition could be influenced by earlier conditions. For example, performance for the second signed condition (video 2) for a given item could be influenced by experience in the first signed condition (video 1). And performance in the non-signed condition (full English version) could be influenced by experience with both video 1 and video 2. These could be seen as a form of “contamination.” It would have been possible to have a design that, for a given item, simply used video 1, because it is very arguable that the score obtained from the first condition for a given item would be the purest indicator of ability; performance in later conditions might be affected by exposure to the item in the earlier conditions. Most of the analyses in this study focus in fact on video 1 (rather than video 2). However, by including video 2 in the study as we did allowed us to attempt to detect the influence of video 1 on performance for video 2. Furthermore, inclusion of the non-signed condition (i.e., using the full English version; shown in Figure 2) was to provide, for comparison, a situation that would arguably be relatively optimal for performance, that is, with the participant having access to the full English version of the item (nothing grayed out) as well as having already seen both the signed versions. Thus, the inclusion of the video 2 and full English version conditions plays a valuable role in this exploratory study, which can inform designs for future larger-scale studies. Sample Following approval from the human subjects review board at Rochester Institute of Technology's National Technical Institute for the Deaf (RIT/NTID), participants were recruited from high schools and colleges in western New York State. To be eligible for participation, students needed to be (a) in high school or post-high school, (b) self-identified as Deaf or hard of hearing (not hearing), (c) self-reported as users of both ASL and written English, and (d) recipients of a grade of “C” or better in their first-semester algebra course. (Four candidates were not eligible due to lower algebra grades.) Researchers sought an approximate 50–50 split between high school and post-high school participants. Participants received gift cards for their participation. The final sample included 12 individuals at the high school level and 19 at the post-high school level. There were 18 males and 13 females; 23 characterized themselves as Deaf and 8 as hard of hearing. The sample had a mean age of 20.35 years (SD = 4.47). For self-report of race/ethnicity, 21 were White (not Hispanic), 3 were Hispanic, 2 were Black or African American, 2 were Asian or Pacific Islander, and 3 categorized themselves as “Other.” Regarding the individual’s first language, 21 participants chose ASL or other signing modality (e.g., “sign language,” “ASL and PSE”) and the remaining 10 individuals chose English or another spoken language. The mean age for first exposure to ASL was 5.27 years (SD = 6.08). Participants generally reported good grades in the first semester of high school algebra: 15 indicated “A,” 13 indicated “B” (including 1 “B+,” 1 “A or B,” and 1 “B or above”), and 3 indicated “C.” As shown in Table 2, by and large the participants considered themselves capable in both ASL and English, as indicated by their levels of agreement with literacy questions, though participants indicated the least agreement with the statement: “It is easy for native English users to understand my English writing.” Table 2 Self-report of ASL and English proficiency Question SD D N A SA 1. It is easy for Deaf signers to understand my ASL 0 0 1 9 21 2. It is easy for me to understand native Deaf ASL signers 0 1 3 10 17 3. It is easy for native English users to understand my English writing 0 1 10 13a 7 4. It is easy for me to read and understand written English 0 0 0 13 18 Question SD D N A SA 1. It is easy for Deaf signers to understand my ASL 0 0 1 9 21 2. It is easy for me to understand native Deaf ASL signers 0 1 3 10 17 3. It is easy for native English users to understand my English writing 0 1 10 13a 7 4. It is easy for me to read and understand written English 0 0 0 13 18 Note: SD = strongly disagree; D = disagree; N = neither agree nor disagree; A = agree, SA = strongly agree. aOne individual responded both “agree” and “strongly agree.” This was categorized as “agree.” Table 2 Self-report of ASL and English proficiency Question SD D N A SA 1. It is easy for Deaf signers to understand my ASL 0 0 1 9 21 2. It is easy for me to understand native Deaf ASL signers 0 1 3 10 17 3. It is easy for native English users to understand my English writing 0 1 10 13a 7 4. It is easy for me to read and understand written English 0 0 0 13 18 Question SD D N A SA 1. It is easy for Deaf signers to understand my ASL 0 0 1 9 21 2. It is easy for me to understand native Deaf ASL signers 0 1 3 10 17 3. It is easy for native English users to understand my English writing 0 1 10 13a 7 4. It is easy for me to read and understand written English 0 0 0 13 18 Note: SD = strongly disagree; D = disagree; N = neither agree nor disagree; A = agree, SA = strongly agree. aOne individual responded both “agree” and “strongly agree.” This was categorized as “agree.” Materials Items This section describes the math items, including their mathematics content, and how the items were developed and delivered. The items were used to obtain the math score (one for each of the three conditions) and the translation score (one for each of two signed conditions). Mathematics content The 10 mathematics items were selected from a set of items that had been used as practice materials for an existing standardized mathematics test. The test addressed pre-college mathematics and included content that would be encountered in typical middle school or early high school mathematics classes. All 10 items had a multiple-choice response format with five answer options. Key criteria for selecting items were that half (i.e., five) be judged by two experts in education and ASL, one of whom also has expertise in mathematics education, as “easy” to translate from English into ASL, and that the others be judged as “hard” to translate. Three of the items had graphics. Creation of the ASL videos of the signed versions In developing the two signed versions, the intent was to provide translations into ASL that were as comparable as possible to those of native ASL signers. However, each of these versions differs in some respects from native ASL. Therefore, we will refer to the avatar version as “digital ASL,” to indicate that it was produced electronically (i.e., using a computer-based authoring environment), and the human version as “L2 ASL” to indicate that the signer (an RID-certified interpreter) had learned ASL as a second language. Only the English-heavy portions of the item stems were interpreted into ASL. Figures, which were present in two of the 10 items, occurred only as stimuli (rather than as part of the item stem or response options) and were not interpreted, nor were response options. The rationale behind this decision includes (1) that a major factor in D/HH individuals’ challenges in understanding mathematics is the English components, which on this mathematics test are primarily in the stems; (2) that mathematics expressions are in themselves English-free, and (3) that English is ordinarily used sparingly in graphics (figures) within mathematics items, primarily for labels (e.g., “angle B”). It was assumed that D/HH individuals would generally experience considerably less difficulty with such labels than with more complicated connected text. In addition, response options are often single words, letters, numbers, or brief mathematical expressions; so often they too present less English complexity than stems, which may involve more complicated grammatical structure. To minimize screen clutter, researchers also decided not to include graphics and written mathematical expressions in the ASL delivery because this content would already be present as on-screen text, along with the response options. Prior to the recording session, researchers and ASL expert consultants conferred with mathematics test developers to ensure that translations were not only linguistically correct but also maintained construct validity. Procedure for creating the human version An RID-certified sign language interpreter with over 10 years’ experience interpreting for D/HH students in K-12 schools served as the human signer. Developing the human version, created before the avatar version, involved an iterative process: the expert consultants (a Deaf ASL/English bilingual researcher with experience translating high-stakes test items into ASL, and a hearing researcher with superior ASL fluency and expertise in mathematics education, with assistance from a member of the research team with experience in both test development and ASL linguistics research) agreed on a translation and worked with the signer to rehearse and refine it before recording. For some items, one of these consultants recorded rough self-videos as models for the interpreter. The team provided the interpreter with an enlarged printout of an English-word glossing of the ASL translation to use as a memory aid while signing. The videos were recorded to show the typical signing space, that is, head to hips, excluding legs. After each recording of each item, the team, including the signer, watched and critiqued the video. The signer rehearsed and then recorded the translation again, if necessary. This process was repeated until the team was satisfied; then they moved on to the next item. This production process occurred over a two-day period. Procedure for creating the avatar version The avatar version was created by a company specializing in signing avatars (Vcom3D).2 Resources provided to Vcom3D included the English items and some general instruction (e.g., use a white male avatar to match the race/ethnicity and gender of the human signer – the intention being to eliminate race/ethnicity as a confounding variable in the study); the same English-word glosses and videos provided to the human signer; some minimal direct question-and-answer with the researchers; and feedback from researcher review. Vcom3D was never given access to the videos of the human signer so as not to provide a resource that would be unavailable in an authentic situation in which avatar signers (instead of human signers) might be used in test development. On average, each item underwent about 2 1/2 review cycles before completion. This process occurred over several weeks. Every revision was based on a review by the same three-person team that worked on the human versions. Delivery of the items and recording of responses As mentioned earlier, the two signed conditions were delivered on computer (specifically, a 15-inch high-resolution Apple Mac laptop) and the non-signed condition (full English version) was delivered on paper. Responses to all items were recorded on paper. The participant also recorded the translation of the item into English on the item response sheet. Participants provided two translations – one for each of the two signed conditions. Participants were able to pause and resume the video as well as use a slider to navigate to any portion of the video. Accompanying the on-screen presentation were a test booklet and English version forms. Test form booklets The test form booklet (for either form A or form B) used by each participant consisted of 10 pieces of 8 1/2 by 11-inch paper printed double-sided, with two printed sides (item response sheets) per item – one for the first signed condition (video 1) and one for the second signed condition (video 2). Each item response sheet showed the partial English version of the item (i.e., appropriately grayed out, just as it was presented on screen for videos 1 and 2). The item response sheets were used to capture participants’ answers to the mathematics questions and their translations of items into English for each of the two signed conditions. English version sheet Participants’ answers to the mathematics questions for the full English version were collected on the English version forms. This consisted of one 8 1/2 by 11-inch sheet for each item. This sheet was essentially the same as the item response sheet, but showed the full English version rather than the partial English (i.e., partially grayed out) version. The purpose was to capture students’ responses to the mathematics items for the non-signed (full English version) condition. Other instruments Other instruments included the background interview (e.g., age, ethnicity, hearing status, self-reports of ASL and English proficiency, mathematics background), post-video questions (e.g., ease of understanding the signing), post-pair questions (e.g., which signed version was easiest to understand), post-English version form, and post-item-set questions (PISQ). Session Procedure The procedure for each participant was as follows: Proctor assigned participant to form A or B and gave instructions to the participant. For each of 10 items, the participant: viewed video 1, answered the mathematics item (by marking the response sheet), translated the item into English (on response sheet), and then answered the post-video questions; viewed video 2, answered the item again, translated it again into English, and answered post-video interview questions for video 2; answered the post-pair interview questions; answered the item again with access to the full English version, and answered the post-English-version interview questions. After completing all 10 items, the participant answered the post-item-set questions (PISQ). Each session lasted about 90 min. All participants communicated with the proctor in ASL. Data Analysis While the number of participants was relatively small (n = 31), a number of statistical analyses were performed. For each of the 31 students there were nine questions analyzed (item 6 was excluded from the 10 originally administered).3 Of the nine items that we analyzed, two items, 4 and 7, used graphics. One item used a mathematics expression in the stem. As mentioned earlier, we often used data from video 1 (rather than either video 2 or a union (pooling) of data from video 1 and video 2). This is because we considered that an initial exposure to an ASL version (video 1) best reflected the use of ASL in an actual testing situation (i.e., the test taker answers the item without already having seen a different signed presentation of the item). Scoring of participant translations The translation score was determined by two researchers with superior ASL fluency and expertise in mathematics education. The two raters each assigned each translation (ASL to English) a score on a scale of 0–5, with 0 representing no response and 5 representing a near-perfect translation. (Initially, one rater inadvertently failed to rate 5 of the translations.) Researchers agreed that generally, if the raters’ translation scores differed by one point, this would be considered acceptable, indicating near-agreement on the accuracy of the translation. Any discrepancy of more than one point between the two raters’ scores or between scores of 2 (unacceptable translation) and 3 (minimally acceptable translation) was addressed and reconciled in a single two-hour meeting, facilitated by the ASL-knowledgeable member of the research team. At the beginning of the session, the five missing ratings were supplied by that rater (who, unfortunately, had already received exposure to the other rater’s scores). Then the discrepancies were reconciled; in every case, the raters quickly reached a consensus score. The final score used in later analysis consisted of (a) the agreed-upon score where there was exact agreement, (b) the consensus score resulting from the reconciliations, or (c) the higher of the two values where the scores did not require reconciliation. Reliability of scoring of translations. Translation raters 1 and 2 showed considerable agreement in their evaluations of the quality of participants’ ASL-to-English translations. The translation score of rater 1 exactly matched that of rater 2 58.6% of the time. The translation score of rater 1 was adjacent to or exactly the same 90.6% of the time. The kappa (unweighted kappa value) was .46 (standard error: .026, p value <.001). The five translations for which ratings were not initially offered by one rater were omitted from agreement calculations, thereby yielding a total of 553 (instead of 558) translations for agreement statistics (with two raters per response). Coding of levels of agreement For the purposes of analysis, the levels of agreement in Likert scales were converted to numbers as follows: strongly disagree (SD) = 1, disagree (D) = 2, neither agree nor disagree (N) = 3, agree (A) = 4, strongly agree (SA) = 5. Tabulation and statistical significance Results were tabulated, and in some cases statistical tests were run. Unless otherwise stated, we used an alpha level of .05 for all statistical tests. Basic description of participants All 31 of the participants answered all nine items. Specifically, each of the 31 participants responded to each mathematics item 3 times, once for each of the two videos and then once for the full English version. Each of the 31 participants made two translations from ASL, once for video 1 and another for video 2. Of the 31 participants, 16 were assigned to form A and 15 to form B. For the highest level of mathematics taken, following are the numbers of participants (based on self-report): 3 at level 1 (algebra 1), 3 at level 2 (geometry), 11 at level 3 (algebra 2 and trigonometry), 10 at level 4 (pre-calculus), 4 at level 5 (calculus). Form effect There was not a significant form effect with respect to mathematics score (for video 1), based on a t-test for equality of means for form A (M = 3.31, SD = 1.92) and form B (M = 4.27, SD = 2.09), t(29) = 1.32, p = .195, two-tailed. Different aggregations of math scores In the analyses conducted, math scores (as well as translation scores) were aggregated in various ways. For example for a given participant, item, and condition, the math score is simply a 1 (correct) or a 0 (incorrect). A math score for an individual across the 9 items will depend, for example, on whether it is summed by video (e.g., all scores for video 1), by signed condition (e.g., all scores for the avatar-signed condition), or by full English version (non-signed condition). Obviously, math scores for groups of participants can be based on any of these scores for individuals. Completeness of key data For 31 participants and nine items, we obtained three math scores (video 1, video 2, and full English version) and two translation scores (video 1 and video 2). Results Key findings regarding the research questions were as follows. 1. Which Signed Version is Most Comprehensible to Deaf participants? Two key indicators of comprehensibility of the signing are (a) relationship between mathematics score and signed version and (b) reported quality of signing. Relationship between mathematics score and signed version For video 1, there was no significant relationship between math score (1 or 0) and signed version (avatar versus human), chi-square(1, N = 279) = .030, p = .863. Note that this statistic is based on the math scores of 31 participants for each of the 9 items. Results were similarly not significant for both video 2, chi-square(1, N = 279) = .430, p = .512, and for the union of data from video 1 and video 2, chi-square(1, N = 558) = .117, p = .733. Thus, notwithstanding the higher perception of quality for the human version (paragraph above) no significant difference in math score based on signed version was detected. This lack of significant difference could have been due to the small sample size. Reported quality of signing Participants tended to rate the quality of the signing higher for the human version than for the avatar version. Specifically, 61.3% of participants responded “very good” or “excellent” to the question of quality of human translation, while for the same question for avatar, only 9.7% of participants responded “very good” or “excellent.” (Options were excellent, very good, good, fair, and poor.) 2. To What Degree do Deaf Participants Prefer a Human or Avatar Signed Version of Mathematics Test Questions? Preference was addressed by two questions. First, in response to the question, “Which kind of signing did you prefer for mathematics items – avatar or human? (avatar, human, no preference),” 29 of 31 participants preferred human, 1 preferred avatar, and 1 indicated “no preference.” Second, participants also indicated greater agreement with a recommendation of the human version over the avatar version for mathematics tests. Specifically, in response to the statement, “I would recommend the human version in mathematics tests for high school or higher education students who are deaf or hard of hearing” (emphasis added), 26 participants (83.9%) agreed or strongly agreed. On the other hand, for the same question regarding the avatar version, only five participants (16.1%) agreed or strongly agreed. 3. What Kinds of Usability Challenges do Deaf Participants Experience When Using the Avatar and Human Versions? Avatar The main challenge identified for the avatar versions was the relative lack of facial expression and body language (e.g., “There are no facial expression and mouth doesn’t mouth the words4”). Besides indicating affect, these features are essential to ASL grammar. Although the facial expressions and body positions of ASL were present to some degree in the avatar versions, they were by no means as fully developed or fluent as in human signing. Some participants mentioned not being familiar with signing avatars and so found them difficult to understand (e.g., “Usually see human signing – not avatar – so I was not used to translating avatar’s ASL to English”). Human Participants noted few challenges specific to the human versions, apart from one person’s observation that the human signer should pause more frequently. Human and avatar Some challenges were noted as common to both avatar and human versions. These included: some vocabulary (perhaps the mathematics-related vocabulary, though this was not specified) was unfamiliar; and some signs were difficult to translate into English. (Some bilingual signers with limited translation experience may not know which English word to use to translate a given ASL sign, even if they know both the sign and a good English equivalent word or phrase.) Check on visual appearance Participant responses suggest that a large majority of participants found the computer- and paper-based materials visually adequate. A clear majority of participants responded “yes” to the following questions: “Were the videos easy to see (big enough and sharp enough image)?” (30 “yes” responses); “Were the background colors for the avatar and human acceptable?” (29 “yes” responses); “Was the lighting for the avatar and human acceptable?” (31 “yes” responses); “Were the English items on computer easy to see?” (26 “yes” responses5); and “Were the English items on paper easy to see?” (31 “yes” responses). 4. What Suggestions do Deaf Participants Make for Improving the Usability of Signed Versions? Participants made several suggestions for improving the signed versions (in the post-item set, question 19). For the avatar version, key suggestions centered on issues such as facial expressions and mouthing, overall fluency/fluidity of signing, and fingerspelling and signing of numbers. For the human version, most participants were fairly satisfied, apart from the general issues that applied to both avatar and human versions. Among the suggestions relevant for both avatar and human are: (1) Use human, not avatar, signing. Or: do not use video at all, but instead a live signer. This reflects the clear preference for human signer over avatar. (2) Caption all videos, or have a full English version available on paper or elsewhere. Many participants felt strongly that they, or other D/HH individuals, would have benefited from seeing full English versions along with ASL. (This, as noted, was not possible due to design considerations: displaying English along with ASL would have made it impossible to evaluate the ASL’s usefulness, as distinct from the usefulness of the English text.) Of course, in actual use, if videos of ASL were used for test content, the test taker would have access to the English version of the test as well. (3) Use more mouth movement and facial expression. (4) Sign English, not ASL. One of the research consultants reported that a number of the participants were unfamiliar with mathematics content presented in ASL. Participants had apparently learned mathematics from written (or on-screen) English, along with graphics, and/or from signed instruction in a sign system that uses ASL signs in roughly English word order. For these individuals, any potential advantage of the ASL versions may have been neutralized by their unfamiliarity with the presentation of mathematics content in ASL. 5. Is There an Association Between the Reported Age of Starting to Learn ASL and Either Mathematics Score or Translation Score? One-way analyses of variance were computed and did not indicate a significant association between participants’ reported age of starting to learn ASL and either their total mathematics score for video 1, F(4, 25) = .75, p = .566, or their total translation score, F(4, 26) = .85, p = .507. 6. Is There Evidence that Participants Benefited From a Second Signed Version or from the Full English Version? Participants’ total mathematics scores for video 1 (M = 3.77, SD = 2.03) and video 2 (M = 4.10, SD = 1.96) were not significantly different from each other based on an analysis of variance (ANOVA); t(30) = 1.54, p = .134, two tailed. However, the participants’ mathematics scores for the full English version (M = 4.61, SD = 2.43) were significantly different from both video 1 mathematics scores; t(30) = 3.14, p = .004, two tailed, and from video 2 mathematics scores; t(30) = 2.15, p = .040, two-tailed. Thus, we did not detect a significant difference (e.g., growth) between video 1 and video 2, yet exposure to the full English version did appear to boost mathematics scores. Thus, the key finding is that, as expected, participants performed better in the non-signed condition (full English version) (having already seen the earlier signed conditions) than they performed in those signed conditions (video 1 and video 2). Although there was no significant difference between performance on video 1 and video 2, the observed difference was in the expected direction, with video 2 being higher than video 1.1 Analysis by item, showing the number of participants answering correctly As shown in Table 3, for the most part, the number of people answering a math item correctly in the non-signed condition (full English version) was equal to or higher than the number answering correctly in the video 1 and video 2 conditions. (This is indicated by “No” in the last row of Table 3.) This was expected because the non-signed condition allowed access to the full English item after having already seen both the human and avatar videos. As may be seen in Table 3, this did not occur for items 1 and 7 (see the “Yes” in the last row). We do not know why, for items 1 and 7, the number of people answering correctly for the full English version was lower than for the signed conditions (video 1 and video 2). It may have been due to random noise. Table 3 Number of people answering the item correctly for Video 1, Video 2, and the full English version Condition Item 1 2 3 4 5 7 8 9 10 Video 1 18 10 14 19 16 8 6 11 15 Video 2 17 12 14 20 15 9 13 13 14 Full English version 16 16 17 21 16 6 15 20 16 Number for full English version is less than the number for both signed versions (video 1, video 2) Yes No No No No Yes No No No Condition Item 1 2 3 4 5 7 8 9 10 Video 1 18 10 14 19 16 8 6 11 15 Video 2 17 12 14 20 15 9 13 13 14 Full English version 16 16 17 21 16 6 15 20 16 Number for full English version is less than the number for both signed versions (video 1, video 2) Yes No No No No Yes No No No Note: As mentioned earlier, item 6 was not used for the analysis. Table 3 Number of people answering the item correctly for Video 1, Video 2, and the full English version Condition Item 1 2 3 4 5 7 8 9 10 Video 1 18 10 14 19 16 8 6 11 15 Video 2 17 12 14 20 15 9 13 13 14 Full English version 16 16 17 21 16 6 15 20 16 Number for full English version is less than the number for both signed versions (video 1, video 2) Yes No No No No Yes No No No Condition Item 1 2 3 4 5 7 8 9 10 Video 1 18 10 14 19 16 8 6 11 15 Video 2 17 12 14 20 15 9 13 13 14 Full English version 16 16 17 21 16 6 15 20 16 Number for full English version is less than the number for both signed versions (video 1, video 2) Yes No No No No Yes No No No Note: As mentioned earlier, item 6 was not used for the analysis. Also as shown in Table 3, item 4 was the easiest item (the most people answering it correctly), while item 7 was the most difficult item (the least number of people answering it correctly). Item 7 required the student to identify from a list of choices the fraction that was halfway between two other given fractions. 7. What is the Relationship Between Mathematics Score and Translation Score? Participants’ total mathematics scores (out of nine possible points) for video 1 (M = 3.77, SD = 2.03) were strongly correlated with their total translation scores (of 45 possible points [9 items times 5 possible points per translation]) (M = 21.71, SD = 6.02), r = .655, p < .001. 8. What is the Relationship Between Translation Score and Signed Version? For video 1, there was a slight association between translation score and signed version, chi-square(5, N = 310) = 7.92, p = .161. There was no significant association between translation score and signed version for either video 2, chi-square(5, N = 310) = 1.44, p = .920, or for the union of data from video 1 and video 2, chi-square(5, N = 558) = 2.85, p = .724. 9. Are there Indications About the Adequacy of the Full-English Version by Itself? The value of the English version by itself was not directly evaluated in this project. Direct evaluation might have required examining participant performance with a full English version, both with and without exposure to a signed version. (By contrast, in the present study, participants saw the full English version of an item only after having been exposed to both signed versions.) Nevertheless, 93.2% either agreed or strongly agreed with the statement, “The English version alone (without any signing) would be enough for me to answer this item correctly” (SD = 1, D = 2, N = 3, A = 4, SA = 5). 10. Is there a Relationship Between Highest Mathematics Course Taken and Total Mathematics Score? A one-way analysis of variance (ANOVA) was used to test for differences in total mathematics score (for nine items) for video 1 among the levels of participants’ highest mathematics course taken (1 = algebra 1; 2 = geometry l; 3 = algebra II/trigonometry; 4 = pre-calculus; 5 = calculus). We found no significant difference in participants’ total mathematics scores for video 1 based on the participants’ highest mathematics courses taken, F(4, 26) = .79, p = .540. The number of participants for some levels was small (n = 3 for 2 levels and n = 4 for one level) and therefore it is not possible to attach much practical significance to this finding. Discussion The expectation that there would be differences in the usability between avatar and human signed versions of the mathematics items was only partly supported by this research study. Participants showed a clear-cut preference for human signing over avatar versions, as was observed by Russell et al. (2009). Participants believed that the quality of signing in the human version was higher than that of the avatar version. This study also appears consistent with findings by Russell et al. (2009) that signed version (human versus avatar) was not significantly associated with mathematics score. Furthermore, in this study, signed version was not significantly associated with translation score. Thus, neither mathematics scores nor translation scores provided significant evidence that one version was more comprehensible than the other. Mathematics score and translation score were strongly correlated (see research question 1). It is difficult to say which way the causation runs; is mathematics content knowledge (as well as language knowledge) required to produce an accurate translation, or is language comprehension prerequisite to understanding the mathematics content as presented? Presumably, each of these plays a role. One of the researchers observed that those participants who struggled to translate the signed versions into English did not appear to grasp the underlying mathematics content very well. This is typical of translation situations: understanding the content is crucial to a good translation. But the converse is likely true as well, that understanding the language in which content is presented is essential to understanding that content. That is, the students who use ASL themselves socially, but whose mathematics education has been in English, may experience difficulty in understanding even familiar mathematics content when it is presented in ASL. An explanation for such a difficulty could be informed by studies that point to the strength of the language of instruction for bilingual students and adults (Ardila & Rosselli, 2017; Martinez-Lincoln, Cortinas, & Wicha, 2015). These studies suggest that learners tend to revert to the language in which they learn mathematics regardless of whether that language is their native language when solving problems or recalling mathematics concepts. Although this can be reversed with high proficiency in the second language, it may be the case here that students resorted to English, the language in which they first learned the needed mathematics concepts. This would also support the previous hypothesis that the participants did not have high-level cognitive/academic language proficiency (CALP; Baker, 2006) in ASL that would have allowed them access to the problems. An alternate explanation might be that the interpreter’s signing was not clear to the student, and therefore created difficulties in translating the item from ASL to English. Future studies may benefit from cognitive debriefing the draft item translations with student users in addition to the research team prior to finalizing the ASL translations. The lack of differences in the comprehensibility (as indicated by mathematics scores and translation scores) between avatar and human versions, despite clear discrepancy in satisfaction with these versions, may be due to any of several reasons. For example, it may be due to small sample size. Also, it may be due to less familiarity with mathematics presented in ASL, which, of course, raises the issue of the need for such translations. However, we would argue that the need for such translations stems from the low achievement of these students. If presentation and instruction in English is not affording students success in mathematics, perhaps students should be given the opportunity to learn mathematics through ASL with accurate and appropriate signs. This could yield a greater understanding of mathematics as presented in ASL, leading to greater achievement. Despite some movement in this area, for most students, this has yet to be a reality. Another part of the explanation may be due to a trend that is consistent with current phenomena in deaf education and educational interpreting. Many D/HH students have become accustomed to working with educational interpreters and deaf education teachers with a wide range of signing proficiency (Schick, Williams, & Kupermintz, 2006). When the interpreter or teacher is not proficient in ASL, ASL-signing students must try their best to understand signing that may be semantically unclear and that may differ from ASL in significant ways, including the absence or misuse of ASL grammatical markers conveyed by gaze, face, and body. Since the model signer in our study was a K-12 certified interpreter with extensive experience in mainstreamed school settings, it is possible that some of our participants, particularly those who are accustomed to Deaf teachers’ ASL signing style, had difficulty comprehending this interpreter’s signing style. Schick, Williams, and Kupermintz (2006) find significant variation in the types of signing used, and the interpreting skills, of educational interpreters. If D/HH students’ instructional language for mathematics has been ASL (or some other form of signing), it is highly likely that despite our efforts to use accurate ASL, the signing may have been quite different from what students have seen in instructional settings. Another consideration that may have influenced results of the study is participants’ mathematics proficiency levels. Note that participants’ total mathematics scores for video 1 had a mean of 3.77 (SD = 2.3) out of 9 possible points (1 point per item), which is arguably low, given that the items addressed pre-college mathematics and included content that would be encountered in typical middle school or early high school mathematics classes. The current study used math score as an indicator of the usability (e.g., comprehensibility) of the given conditions rather than an indicator of mathematics proficiency. However, the relatively low scores raise the issues of whether the underlying mathematics proficiency of participants was in fact low relative to the difficulty of the items and that results might be different if there were a closer match between the mathematics difficulty of the items and the mathematics proficiency of the participants. Though not a major focus of the current study, it is worth noting that given the current state of the art, producing a good avatar version may not be significantly less expensive than a good human version. Both versions involved multiple cycles of revision. Much of the savings in using avatar versions would likely accrue where corrections need to be made. Revising the avatar version can be done on computer, but revisions to a human version could involve substantial costs for actors and video production staff. It appears that the key challenge identified for the avatar versions was the relative lack of facial expression and body language. These features are essential to ASL grammar, and the facial expressions and body positions of ASL were not nearly as fully developed or fluent as in human signing. Furthermore, gaze direction, which is crucial to ASL grammar and discourse, was not used meaningfully in the avatar versions. None of the participants mentioned gaze specifically, but they may have thought of it as a component of facial expression. Limitations The study had several limitations. First, although the goal was to use ASL, the digital ASL (avatar) and L2 ASL (human) signers did not make full use of certain non-manual aspects of ASL. The human interpreter’s non-native ASL acquisition could have resulted in incomplete acquisition of these components of the language. Additionally, the relatively small sample size (only 31 participants) and the small number and variety of items (e.g., nine items, all multiple-choice, only pre-college mathematics) limit the interpretation of some of the study results. Also, there was a lack of a counterbalanced design (e.g., the full English version always occurred last), which made it impossible to cancel possible order effects. A limitation that may have worked against acceptance of the avatar was participants’ lack of familiarity with avatar signing. Nor did the study compare a version in which mathematics expressions were signed and graphics described in ASL with a version such as the one presented here, in which only the English-heavy components of the item were signed. Finally, five of the 558 ratings were obtained (as noted above) under less-than-ideal conditions (i.e., one rater having already received exposure to the other rater’s scores). It should be kept in mind that the goal of the current project was to compare the human and avatar versions rather than to evaluate the usefulness of an ASL translation (versus no translation). Suggestions for Future Research Additional research is needed to address remaining issues. As noted earlier, there is a need to determine whether signed presentation of the test content has an effect, positive or negative, on mathematics performance. This would involve evaluating comprehensibility of mathematics items with and without ASL videos. Such a study should obviously oblige participants to rely on their English proficiency (without first have been exposed to a signed version, as in the current study). There is also a need to find ways to improve the avatar’s expression (face [including gaze], mouth, body), the clarity of fingerspelling and numbers, and the fluidity of the signing. Furthermore, best practices need to be established for signing figures, mathematics expressions, and response options. If the items are usable (and if other validity criteria are met) without signing these elements, this would suggest that signing these non-English portions of mathematics items may not be necessary. To ensure that, to the greatest extent possible, the signed version is strongly ASL, it is suggested that future studies similar to the present one involve a Deaf, certified deaf interpreter (CDI) as the human interpreter and/or a certified educational interpreter, who would have content knowledge as well as an understanding of how to interpret academic concepts (Hutter & Pagliaro, 2016; Schick & Williams, 2004). Significance This study was intended to take important initial steps toward informing testing organizations about the usability of ASL videos (human and avatar) in the context of mathematics assessments. The study gains importance in its finding of no significant association between signed version and either mathematics score or translation score. It goes beyond earlier studies in two key ways. First, it obliged the participants to rely on signed versions, therefore allowing better inferences about the impact of the signing itself, rather than letting the signing’s effect be confounded with the impact of participants’ understanding of English text. Second, by using translation score, this study provided an additional indicator (beyond mathematics score) of participant comprehension of the signed versions. Additionally, the study provides a useful snapshot of the current state of avatar technology. These initial results suggest that improvements to avatar videos are needed before their user acceptance would equal that of human videos. Notes 1 Registry of Interpreters for the Deaf. 2 Clymer, Geigel, Behm, and Masters (2012) cite Vcom3D as one of several organizations having a state-of-the-art signing avatar system. 3 Through analysis of participants’ translations from ASL into English, it was found that for one item, a significant piece of information was missing from both human and avatar versions. Because of this missing information, data from this item were not included in the results that follow. The removed item had been categorized as easy to translate and used a graphic. Thus, the final total number of items was nine. The report continues to refer to items by their original numbers 1 through 10, but with item 6 not included in the analysis. 4 Participants’ comments were signed; the proctor translated and transcribed them into English. 5 Among the five other responses, for two individuals responses were not obtained and three gave “no” responses. For two of the “no” responses, the proctor captured participant comments suggesting that these participants may have misunderstood the question. Conflicts of Interest No conflicts of interest were reported. Acknowledgments We gratefully acknowledge: Nick Sferra for video production and post-production for the human version; Jason Hurdich of Vcom3D for authoring the avatar versions; Thomas Florek for developing the delivery system; Emily Werfel of Rochester Institute of Technology (RIT) for data collection and preparation; Nan Kong for data analysis; Kitty Sheehan for coordinating travel and payments; and Heather Buzick and Liz Stone for preliminary review. This work was supported by Educational Testing Service (ETS). References Ardila , A. , & Rosselli , M. ( 2017 ). Inner speech in bilinguals: The example of calculation abilities. In Ardila A. , Cieslicka A. B. , Heredia R. R. , & Rosselli M. (Eds.) , Psychology of bilingualism: The cognitive and emotional world of bilinguals (pp. 27 – 37 ). Cham, Switzerland : Springer International Publishing . doi:10.1007/978-3-319-64099-0_2 Baker , C. ( 2006 ). Foundations of bilingual education and bilingualism ( 4th Ed. ). Clevedon, England; Buffalo, N.Y. : Multilingual Matters . Cawthon , S. W. , Winton , S. M. , Garberoglio , C. L. , & Gobble , M. E. ( 2011 ). The effects of American Sign Language as an assessment accommodation for students who are deaf or hard of hearing . Journal of Deaf Studies and Deaf Education , 16 , 198 – 211 . doi:10.1093/deafed/enq053 Google Scholar CrossRef Search ADS PubMed Clymer , E. , Geigel , J. , Behm , G. , & Masters , K. ( 2012 ). Use of Signing Avatars to Enhance Direct Communication Support for Deaf and Hard-of-Hearing Users. National Technical Institute for the Deaf (NTID), Rochester Institute of Technology. Retrieved from http://www.ntid.rit.edu/sites/default/files/cat/NTID-SigningAvatar_20Mar2012_Final.pdf Higgins , J. A. , Famularo , L. , Cawthon , S. W. , Kurz , C. A. , Reis , J. E. , & Moers , L. M. ( 2016 ). Development of American Sign Language Guidelines for K-12 Academic Assessments . Journal of Deaf Studies and Deaf Education , 21 , 383 – 393 . doi:10.1093/deafed/enw051 Google Scholar CrossRef Search ADS PubMed Hutter , K. , & Pagliaro , C. ( 2016 ). Is the interpreter in your child’s education an educational interpreter? The Endeavor , Fall 2016 , 29 – 33 . Retrieved from https://issuu.com/asdc/docs/asdcfall2016v2hires Johnson , E. , Kimball , K. , & Brown , S. O. ( 2001 ). American Sign Language as an accommodation during standards-based assessments . Assessment for Effective Intervention , 26 , 39 – 47 . doi:10.1177/073724770102600207 Google Scholar CrossRef Search ADS Liddell , S. K. , & Johnson , R. E. ( 1989 ). American Sign Language: The phonological base . Sign Language Studies , 64 , 197 – 277 . doi:10.1353/sls.1989.0027 Martinez-Lincoln , A. , Cortinas , C. , & Wicha , N. Y. ( 2015 ). Arithmetic memory networks established in childhood are changed by experience in adulthood . Neuroscience Letters , 584 , 325 – 330 . doi:10.1016/j.neulet.2014.11.010 Google Scholar CrossRef Search ADS PubMed Mason , T. C. ( 2005 ). Cross-cultural instrument translation: Assessment, translation, and statistical applications . American Annals of the Deaf , 150 , 67 – 72 . doi:10.1353/aad.2005.0020 Google Scholar CrossRef Search ADS PubMed Partnership for Assessment of Readiness for College and Careers . ( 2017 ). PARCC accessibility features and accommodations manual (6th Ed.). Retrieved from avocet.pearson.com/PARCC/Documents/GetFile?documentId=4900 Russell , M. , Kavanaugh , M. , Masters , J. , Higgins , J. , & Hoffmann , T. ( 2009 ). Computer-based signing accommodations: Comparing a recorded human with an avatar . Journal of Applied Testing Technology , 10 . https://atpu.memberclicks.net/assets/documents/computer_based.pdf Schick , B. , & Williams , K. ( 2004 ). The educational interpreter performance assessment: Current structure and practices . In Winston E. A. (Ed.) , Educational interpreting: How it can succeed (pp. 186 – 205 ). Washington, DC : Gallaudet University Press . Schick , K. , Williams , K. , & Kupermintz , H. ( 2006 ). Look who’s being left behind: Educational interpreters and access to education for deaf and hard-of-hearing students . Journal of Deaf Studies and Deaf Education , 11 , 3 – 20 . doi:10.1093/deafed/enj007 Google Scholar CrossRef Search ADS PubMed Smarter Balanced Assessment Consortium . ( 2017 ). Usability, accessibility, and accommodations guidelines. Retrieved from https://portal.smarterbalanced.org/library/en/usability-accessibility-and-accommodations-guidelines.pdf Stokoe , W. C. ( 1960 ). Sign language structure: An outline of the visual communication systems of the American deaf. Studies in Linguistics: Occasional Papers (No. 8). Buffalo, NY: Department of Anthropology and Linguistics, University of Buffalo. Stokoe , W. C. , Casterline , D. , & Croneberg , C. ( 1965 ). A dictionary of American Sign Language on linguistic principles . Silver Spring, MD : Linstok Press . © The Author(s) 2018. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com. This article is published and distributed under the terms of the Oxford University Press, Standard Journals Publication Model (https://academic.oup.com/journals/pages/about_us/legal/notices)

Journal

The Journal of Deaf Studies and Deaf EducationOxford University Press

Published: Apr 12, 2018

There are no references for this article.

You’re reading a free preview. Subscribe to read the entire article.


DeepDyve is your
personal research library

It’s your single place to instantly
discover and read the research
that matters to you.

Enjoy affordable access to
over 18 million articles from more than
15,000 peer-reviewed journals.

All for just $49/month

Explore the DeepDyve Library

Search

Query the DeepDyve database, plus search all of PubMed and Google Scholar seamlessly

Organize

Save any article or search result from DeepDyve, PubMed, and Google Scholar... all in one place.

Access

Get unlimited, online access to over 18 million full-text articles from more than 15,000 scientific journals.

Your journals are on DeepDyve

Read from thousands of the leading scholarly journals from SpringerNature, Elsevier, Wiley-Blackwell, Oxford University Press and more.

All the latest content is available, no embargo periods.

See the journals in your area

DeepDyve

Freelancer

DeepDyve

Pro

Price

FREE

$49/month
$360/year

Save searches from
Google Scholar,
PubMed

Create lists to
organize your research

Export lists, citations

Read DeepDyve articles

Abstract access only

Unlimited access to over
18 million full-text articles

Print

20 pages / month

PDF Discount

20% off