TY - JOUR AU - Arora,, Renee AB - Abstract In this article, we report on an empirical comparison of two common gesture recognition techniques. Thirty-one novices completed six realistic tasks using either Jot or Graffiti. An analysis of error-corrected data entry rates indicates that participants using Jot completed the tasks significantly faster than those using Graffiti. An analysis of uncorrected errors yielded no significant differences while several questions assessing subjective satisfaction yielded significantly more positive results for Jot. A new event called Period of Difficulty (PoD) is proposed to help identify situations were novices experience significant difficulty. Users experience more PoD when entering basic alphanumeric characters using Graffiti than they do using Jot. In contrast, Jot users experience more PoD when entering symbols than Graffiti users. Further, a detailed analysis of the PoD provides insights regarding the definition and use of the inherent accuracy metric while highlighting opportunities to improve the underlying technologies. We conclude by providing specific recommendations for improving the usability of Jot and Graffiti for novice users and outlining several additional directions for future research. 1 Introduction Mobile, handheld, computing devices are becoming increasingly common with personal digital assistants (PDAs), cellular phones, and pagers supporting an ever increasing array of activities. As usage expands, users not only record names, phone numbers, addresses, appointments, and reminders, but also access email, text pages, and the World-Wide Web (WWW). As the range of applications continues to expand, data entry becomes more important. To support these activities, systems provide a variety of text-entry techniques. Common alternatives include stylus-activated soft keyboards (e.g. a small QWERTY keyboard presented on a touch-sensitive screen), small physical keyboards, stylus-based gesture recognition (e.g. Jot and Graffiti), and telephone keypad-based techniques (e.g. T9 and multitap). To date, few empirical studies have evaluated the relative merits of these varied techniques. In this article, we present an empirical comparison of the two most commonly used stylus-based gesture recognition techniques: Jot and Graffiti. Our focus is on novice users performing realistic tasks. We focus on novice performance for two reasons. First, many factors, including recommendations from friends and colleagues, cost, aesthetics, and first hand experience with the device, influence purchase decisions. Therefore, differences in initial performance may affect whether or not a device is purchased or used. Second, we believe that many individuals may use such technologies only occasionally. Therefore, their performance is likely to mimic that of novices rather than well-trained experts. While we provide detailed results regarding data entry rates, uncorrected errors, and satisfaction, the definition and use of a new event called a Period of Difficulty (PoD) also provides valuable insights. Our results should prove useful to both product designers and researchers. For product designers, our results provide insights that will allow for more informed decisions regarding which technology to adopt. For researchers, our results highlight the importance of evaluating these technologies under realistic conditions. An analysis of PoD reveals specific problems with both Jot and Graffiti that highlight opportunities for additional research. 2 Related research Stylus-based input to computers has been extensively investigated, but most research has focused on techniques for drawing or editing activities. Other researchers have investigated stylus-based data entry. However, many of these studies have focused on the recognition accuracy of the underlying algorithms and not on how effective these technologies are for realistic tasks. For example, LaLomia (1994) and Frankish et al. (1995) both investigate the relationship between user acceptance and recognition accuracy. Tappert et al. (1990) provide a comprehensive review of the state-of-the-art in handwriting recognition. They discuss the difference between on-line and off-line recognition, the underlying digitization technologies, handwriting properties for several languages, and recognition problems. They continue to discuss the recognition process, including preprocessing, shape recognition, and post-processing. They conclude with a discussion of the results of the recognition process, applications, and several topics for future research. This article provides valuable insights for those individuals interested in implementing a gesture recognition system, but there is no discussion of the productivity users could expect when using state-of-the-art systems under realistic conditions. Goldberg and Richardson (1993) describe the design and preliminary evaluation of unistroke—a gesture-based text entry technique. Unistroke was designed to be easy to learn and fast. The individual strokes were also designed to be tolerant of the changes that frequently occur as individuals write more quickly. A small usability study was conducted during which users completed realistic tasks (e.g. sending email) while using a desk-top tablet. Through this study, usability problems were identified, basic data entry rates were determined, and theoretical data entry rates were computed. For example, several gesture-pairs that were frequently misrecognized were identified as well as difficulties recognizing the dot gesture (which was used to enter a space). While data entry was slow during initial usage, it improved to approximately 12 words per minute (wpm) after one week of use. Using detailed timing data for individual gestures, Goldberg and Richardson estimated that users may be able to enter text as quickly as 40 wpm given sufficient practice. MacKenzie and Chang (1999) recently described a comparison of two gesture recognizers. Their study utilized a Wacom tablet connected to a PC to evaluate the accuracy of Pen for Windows and Handwriter 3.3, both of which recognize block letters. Both systems were used in two modes: one recognized only lowercase letters while the other recognized lowercase and uppercase letters. Practice was limited to three 19-character phrases for each condition and participants were instructed to ignore recognition errors and continue with their task. During the actual study, participants entered numerous phrases of 19 characters. No symbols were included in any of these tasks. Initial performance resulted in 16.1 wpm with a 10% error rate. Overall, data entry speeds ranged from 16.9 to 17.6 wpm with error rates ranging from 7–13%. These results, as well as the extensive analysis of the recognition errors will be useful to researchers working to improve gesture recognition algorithms. However, participants were instructed not to correct errors, which limits the insight we can gain from these results regarding the practical application of these technologies where users correct errors when they occur. MacKenzie and Zhang (1997) describe an evaluation of the immediate usability of Graffiti. Their study was designed to determine recognition accuracy after limited exposure. Their study used a Fujitsu 325 Point pen-based computer, MS-Write 3.1 running on Pen Windows 1.0, and a pop-up window that provided access to Graffiti. Graffiti was locked in uppercase mode throughout the experiment. After studying a reference chart describing the required strokes for each letter for 1 min, participants entered the letters A–Z five times. Participants then practiced entering Graffiti for 5 min, and subsequently entered the letters A–Z five more times. After seven days, participants returned and entered the letters A–Z five more times. After 1 min of study, recognition accuracy was approximately 82%. After 5 min of practice, accuracy increased to approximately 96%. Again, the extensive analysis of recognition errors for individual letters will prove useful as new systems are designed. However, additional information is needed to understand the efficacy of Graffiti for realistic situations when users will be correcting errors and tasks may involve numbers, lowercase letters, or symbols. In a series of studies, data entry speeds ranged from approximately 14 to 18 wpm for various gesture recognition systems (McQueen et al., 1994, 1995; Chang and MacKenzie, 1994; MacKenzie et al., 1994a,b). However, all of these studies involved data entry using a Wacom tablet connected to a PC, not a PDA. Three of these studies focused exclusively on entering numeric characters. One study restricted the recognizer and tasks to lowercase letters, while the final study explored tasks involving both lowercase-only and mixed uppercase and lowercase letters. Neither of these last two studies required users to enter any numbers or symbols. More importantly, for all five experiments, participants were instructed to aim for both speed and accuracy, but to ignore any errors and continue with their task. The restricted character sets, use of a Wacom tablet rather than a PDA, and instructions to ignore errors make the reported data entry rates difficult to interpret with respect to realistic situations where users would correct errors while using mobile, handheld, computing devices. Lewis (1999) describes a study where users did interact with a handheld device (the Simon is an integrated, handheld device that provides the following features: cellular phone, wireless fax machine, pager, email, calendar, appointment scheduler, address book, calculator, notepad, and sketchpad). Participants used several input techniques, including simulated perfect handwriting recognition where they interacted with a drawing program and any attempt to enter a letter was considered acceptable. Participants were required to produce 100% accurate results for tasks that involved entering sentences and addresses. Handwriting speeds averaged 23.6 wpm for sentences and 21.7 wpm for addresses. The fact that users could simply write as fast as they wanted while assuming that every character would be recognized correctly explains these results. Each of these studies provides useful insights. Viewed as a whole, they suggest that gesture recognition may be an effective input technique for handheld devices. These studies also provided insights that will prove useful as the underlying recognition algorithms are improved or the required gestures are adjusted. At the same time, the experimental design employed by many of these studies also limits how we can use these results. When restricted character sets are used, our ability to generalize these results to tasks that involve more diverse character sets is limited. When users are instructed to ignore errors, we gain insights into the data entry rates that may be possible as the underlying recognition algorithms are improved, but we do not learn how existing technologies will perform when used under realistic conditions where users do correct errors. None of these studies explored gesture recognition using a handheld device and realistic tasks while simultaneously requiring users to produce accurate results by correcting errors. By instructing users to complete tasks as they would under realistic conditions, which includes correcting errors, we shift the focus from theoretical to realistic data entry rates. Earlier studies provided insights into data entry rates that may be possible given 100% accurate recognition and error-free performance by users. In contrast, our results will provide insights into data entry rates that can be expected under realistic conditions using state-of-the-art recognition algorithms. 3 Inherent accuracy and immediate usability As Goldberg and Richardson (1993) developed unistroke, one of their three design criteria was to make the gestures easy to learn. To achieve this goal, they made ‘many unistroke characters the same (or similar) to ordinary Roman characters’. Similarly, when evaluating Graffiti MacKenzie and Zhang (1997) define inherent accuracy as ‘the extent to which Graffiti strokes matches letters in the Roman alphabet’. Inherent accuracy is presented ‘as a first-level approach to measuring the immediate usability of Graffiti’. At the same time, they acknowledged that the required strokes may match the uppercase version of a letter, the lowercase version, both, or neither. Further, even if the required strokes matches the corresponding character they may not match the way an individual user writes that particular character. Finally, strokes that match the corresponding letter may be easier for the user to remember while simultaneously making the recognition task more complex. Therefore it is important that gestures be evaluated from the perspective of both the user (to ensure that users can learn, remember, and use the gestures) and the recognition algorithm (to ensure that the algorithm can identify the various gestures effectively). The relationship between inherent accuracy and immediate usability has not been empirically validated and there are reasons to interpret inherent accuracy cautiously. At the same time, data reported by MacKenzie and Zhang (1997) indicates that strokes which resemble the resulting character are learned and used more easily. After 5 min of practice, users were more successful when the required strokes matched either the uppercase or lowercase version of the resulting letter. More specifically, users were successful over 96% of the time when the required strokes matched the resulting letter, but only 93% of the time when the required strokes that did not match either the uppercase or lowercase version of the resulting letter (t(24)=−2.2, p<0.05). As a result, we view inherent accuracy as a quick, but incomplete, assessment of how rapidly novices can learn, remember, and use a set of gestures. The relationship between inherent accuracy and practiced-performance is less certain. 4 Research objectives and hypotheses The current experiment was designed to evaluate the relative effectiveness of Jot and Graffiti for a variety of tasks users will encounter with increasing frequency as Internet-enabled mobile handheld computing devices become more common. This study focuses on novice performance. Participants were instructed to complete the task as they would under realistic usage conditions, balancing speed and accuracy. As a result, we expect most, but not all errors to be corrected. Our results will provide insights regarding the adoption of the technology (since such decisions are often made based on limited interactions), the effectiveness of these techniques for infrequent users, and changes that would enhance these technologies. The issues explored in the current study, including inherent accuracy, only address the usability of gestures from the users' perspective. Both Jot and Graffiti allow letters, numbers, and symbols to be entered using one or more strokes of the stylus, but a careful review of the required strokes reveals important differences (see Appendix B). As discussed earlier, we use inherent accuracy as a quick, but incomplete, assessment of how rapidly novices can learn, remember, and use a set of gestures. Table 1 reports inherent accuracy scores for Jot and Graffiti. Scores are provided for uppercase letters, lowercase letters, numbers and symbols with each score indicating the percentage of strokes that closely resemble the resulting character. Note the large difference between the inherent accuracy scores for uppercase and lowercase for Graffiti. Since we cannot know which form of the letter users are likely to employ when interacting with these systems, we suggest using the lower of these two scores when assessing a set of gestures. Therefore, we use the lowercase score for Jot and the uppercase score for Graffiti as this provides a more conservative assessment of these gestures. Overall, these values suggest that Jot should be easier for novices to learn, remember, and use. Table 1 Inherent accuracy of Jot and Graffiti for lowercase letters, uppercase letters, numbers and symbols. Values represent the percentage of characters for which the required stroke results in a pattern that is visually similar to the resulting character . Jot . Graffiti . Lowercase lettersa 100 42 Uppercase lettersa 89 69 Numbers 100 90 Symbolsb 81 52 . Jot . Graffiti . Lowercase lettersa 100 42 Uppercase lettersa 89 69 Numbers 100 90 Symbolsb 81 52 a Graffiti scores for lowercase and uppercase letters are from MacKenzie and Zhang (1997). b The 32 symbols accessible using the punctuation shift mechanism within Graffiti on the Palm III were used for this analysis (the tab character was excluded since there is no corresponding visual representation). Open in new tab Table 1 Inherent accuracy of Jot and Graffiti for lowercase letters, uppercase letters, numbers and symbols. Values represent the percentage of characters for which the required stroke results in a pattern that is visually similar to the resulting character . Jot . Graffiti . Lowercase lettersa 100 42 Uppercase lettersa 89 69 Numbers 100 90 Symbolsb 81 52 . Jot . Graffiti . Lowercase lettersa 100 42 Uppercase lettersa 89 69 Numbers 100 90 Symbolsb 81 52 a Graffiti scores for lowercase and uppercase letters are from MacKenzie and Zhang (1997). b The 32 symbols accessible using the punctuation shift mechanism within Graffiti on the Palm III were used for this analysis (the tab character was excluded since there is no corresponding visual representation). Open in new tab While many Graffiti strokes do resemble the corresponding character (e.g. B, C, and D), others have an appearance that may remind users of the corresponding character only after the connection is established (e.g. A, F, and T), and others appear to have little in common with the character they generate (e.g. [, ”, and %). Overall, numbers are entered by the most natural set of strokes, many letters are associated with less natural strokes, and symbols often require more cryptic strokes. A special stroke puts Graffiti into uppercase-mode to allow capital letters to be entered and a second special stroke allows symbols to be entered. The input region is divided into two sections: one for letters and the other for numbers. Symbols can be entered in either region. For Jot, the majority of characters are entered using one or more strokes that resemble the resulting character (e.g. A, B, 1, 2, {, ?, and ∗) while a few require less intuitive strokes (e.g. ”). Some characters can be entered using a less intuitive optional sequence of strokes that appear to correspond to how the character is entered using Graffiti. Overall, lowercase letters and numbers are entered by the most natural set of strokes while uppercase letters and symbols are occasionally associated with less natural strokes. As with Graffiti, the input region is divided into two regions. Lowercase letters are entered on one side and numbers on the other. Unlike Graffiti, uppercase letters are entered by making the strokes for the lowercase letter on the border of these two regions. Symbol-mode is entered using a button that is available on the left side of the input region or by entering an equivalent stroke. Given the relationship between inherent accuracy and immediate usability, and the results reported in Table 1, our research hypothesis is that individuals using Jot will be able to complete the assigned tasks more quickly than individuals using Graffiti. The primary contributions of this article are an examination of the effectiveness of these two techniques for novice users and the identification areas where additional research may improve the techniques. 5 Method 5.1 Subjects Thirty-one students enrolled at UMBC volunteered to participate in the study. Participants received a payment of $10.00 as compensation for their time. To better represent the potential users of these techniques, participation was restricted to students who were not in an information technology or engineering-based major (e.g. Information Systems, Computer Science, or Computer Engineering). Participation was also limited to individuals who did not have prior experience with Jot or Graffiti. Informed consent was obtained prior to participation. A between-groups design was used for this study, with participants randomly assigned to use either Jot or Graffiti when completing their tasks. Sixteen participants used Graffiti and fifteen used Jot. Demographic information, including age, gender, and computer experience, was gathered at the conclusion of the study. Sixteen participants were female. The average age of the participants was 23.4 (stdev: 6.9). All participants were regular computer users. Ten participants used PDAs, cellular phones, or pagers an average of 2.2 times per day. The remaining 21 participants did not use any of these devices. Of the ten participants that did use these devices, five used Jot during the study and five used Graffiti. 5.2 Apparatus Our aim was to evaluate the relative efficacy of the underlying gestures required by Jot and Graffiti rather than the specific implementations of the two underlying algorithms. We believe that by focusing on the efficacy of the gestures we will provide insights that can guide: the refinement of both techniques to better meet the needs of the intended users, the selection of a platform for a given application based upon the input techniques supported, the selection of an input technique to implement on a given platform assuming the platform can support both techniques equally well. Jot and Graffiti can be used on a single PDA. For example, Graffiti is preinstalled on Palm devices and Jot is available for the Palm OS. However, existing implementations of these two techniques appear to have been optimized for different platforms. As a result, Jot running under the Palm OS is substantially slower than Jot running under Windows CE (it is also substantially slower than Graffiti running under the Palm OS). Therefore, to ensure that the focus was on the efficacy of the underlying gestures and not the efficiency of a specific implementation of the underlying algorithm for a particular platform, it was necessary to use two PDAs for this study: a Palm III for Graffiti and a Casio Cassiopia E100 for Jot. Since using two different PDAs introduces an additional variable, efforts were taken to minimize the impact of using different platforms. Devices were selected that are similar in size, weight, and display area. Most importantly, the size of the display/input region was 2.4×3.1 in. on both devices. The experiment was designed such that a region approximately 0.75 in. high near the bottom of the screen was used for input on both devices while the text entered by the participants was displayed using the upper portion. Further, the devices were configured such that all participants interacted with a monochrome display with no audio feedback. The ink-trace feature available on the Casio was disabled. Participants used the built-in memo application when using Graffiti and the Note Taker application when using Jot. 5.3 Tasks Six tasks that differed in length and content were utilized. Task design was guided by a fundamental goal of engaging our study participants in tasks that are representative of those they would encounter when using an Internet enabled mobile device. Task one involved entering a name and address as may be done when completing an entry in an application designed to keep track of contact information (e.g. an address book). This task was selected due to the various numbers and symbols that would be required when recording detailed contact information. Tasks two and three involved entering URLs. Relatively simple URLs were utilized since more complex URLs are likely to be bookmarked or obtained by entering a simple URL and subsequently navigating to the desired location. URLs are unique in that they are composed of a sequence of letters, numbers, and symbols without intervening spaces. Tasks four through six involved entering varying amounts of basic alphanumeric data, but unlike task one the use of symbols and formatting is minimal. Task four involved entering two words, as may be done when recording the topic of an upcoming appointment. Task five involved entering a short sentence, which could correspond to a longer description of a meeting or a brief reply to an email or page. Task six involved entering a single paragraph as may be done when responding to an email message. As showed in Table 2, a total of 50 unique numbers, symbols, and letters were entered during the six tasks. This included the digits 0–9, eight different symbols/punctuation marks, nine uppercase letters and 23 lowercase letters. The exact text for each task is included in Appendix A. Table 2 Characteristics of tasks one through six Task . Total charactersa . Unique charactersb . Actual wordsc . Words for WPMd . 1 86 40 13 17.2 2 18 14 1 3.6 3 26 15 1 5.2 4 18 12 2 3.6 5 44 16 8 8.8 6 223 28 41 44.6 Total 410 50 75 83.0 Task . Total charactersa . Unique charactersb . Actual wordsc . Words for WPMd . 1 86 40 13 17.2 2 18 14 1 3.6 3 26 15 1 5.2 4 18 12 2 3.6 5 44 16 8 8.8 6 223 28 41 44.6 Total 410 50 75 83.0 a Character count includes spaces. Required carriage returns are included for task one. b Each unique uppercase and lowercase letter, numbers and symbols/punctuation mark was counted once. c Actual word count computed assuming spaces define word boundaries. d Words for WPM calculations based upon a frequently used standard for written English of 5 characters/word (4 letters plus a space). Average word lengths differ in other languages. Open in new tab Table 2 Characteristics of tasks one through six Task . Total charactersa . Unique charactersb . Actual wordsc . Words for WPMd . 1 86 40 13 17.2 2 18 14 1 3.6 3 26 15 1 5.2 4 18 12 2 3.6 5 44 16 8 8.8 6 223 28 41 44.6 Total 410 50 75 83.0 Task . Total charactersa . Unique charactersb . Actual wordsc . Words for WPMd . 1 86 40 13 17.2 2 18 14 1 3.6 3 26 15 1 5.2 4 18 12 2 3.6 5 44 16 8 8.8 6 223 28 41 44.6 Total 410 50 75 83.0 a Character count includes spaces. Required carriage returns are included for task one. b Each unique uppercase and lowercase letter, numbers and symbols/punctuation mark was counted once. c Actual word count computed assuming spaces define word boundaries. d Words for WPM calculations based upon a frequently used standard for written English of 5 characters/word (4 letters plus a space). Average word lengths differ in other languages. Open in new tab Fig. 1 shows how frequently each letter is used in English text (Pratt, 1939) and how frequently each letter is used during our six tasks. While the frequencies are not identical, letters that occur frequently in written English occur frequently in our tasks and letters that occur infrequently in written English occur infrequently in our tasks. Fig. 1 Open in new tabDownload slide Frequency of each letter in English text and the six experimental tasks. Fig. 1 Open in new tabDownload slide Frequency of each letter in English text and the six experimental tasks. 5.4 Dependent variables Dependent variables were defined to allow performance, satisfaction, and the underlying process to be assessed. For performance, we focus on data entry rates—measured in corrected wpm and uncorrected errors. For satisfaction, we administered a questionnaire that investigated feelings regarding how easy it was to enter text, how quickly text could be entered, and the acceptability of the accuracy of the technique. The questionnaire also investigated whether the individual was comfortable using the device, felt physically tired when using the device, or would be interested in using the device in the future. All responses were provided using a scale from 1 to 5 (1=strongly agree, 5=strongly disagree). Novices often experience difficulties when interacting with a new technology. When using Jot or Graffiti, gestures may be entered incorrectly resulting in the wrong character. When this occurs, users must delete the incorrect character and enter the gesture again. Occasionally, when users experience significant difficulty, multiple attempts are required before a character is entered correctly. We define situations where users experience significant difficulty as a Period of Difficulty (PoD). To operationalize this concept, we define an upper limit to the data entry rates we expect to observe and a lower limit for the number of attempts required before we classify the event as a PoD. Given our pilot study, we expected data entry rates to be in the range of 4–9 wpm (as opposed to the 14–18 wpm reported elsewhere). Assuming four or more gestures (e.g. enter a gesture, delete it, enter the gesture again, delete it, …) are involved in a PoD, we define a PoD to be a period of five seconds (4 gestures at 9 wpm=5.3 s) or longer during which a participant made multiple attempts before entering a single character correctly. 5.5 Procedure Input technique was treated as a between-group variable with each participant using either Jot or Graffiti. Task was treated as a within-subject variable with each participant completing all six tasks in a unique random order. After reading and signing consent form, participants were given the device they would be using (turned off). At this time, they were asked to write down the types of activities they would expect the device to support. Next, they provided similar information by selecting activities from a predefined list. This information was gathered as part of a long-term project exploring the relationship between the physical characteristics of handheld devices and perceived uses. Given the similarity of the devices used in the current study, these data were not expected to provide any insights at this time. Throughout the study, participants were free to hold the device in their hand or rest in on a table. Participants were provided with the standard reference chart illustrating the gestures recognized by the technique they would be using. The reference chart was available throughout the study. Next, they were provided with a brief orientation to the technique (e.g. Jot or Graffiti) they would be using. The orientation included a demonstration of how to write several lowercase letters, uppercase letters, numbers, and symbols. While focusing only on uppercase letters, MacKenzie and Zhang (1997) reported a dramatic improvement in recognition accuracy with Graffiti after only 5 min of use. We provided our participants with 10 min to practice entering data. To structure their practice, they were given a collection of example tasks that were similar in length and content to the actual tasks they would be assigned. Participants were free to practice using these sample tasks or any other text. Participants were provided with one task at a time to ensure that the tasks were completed in the appropriate order. For each task, the participant was given a sheet of paper with the required text. They were allowed to review the text and begin the task when they were ready. Participants were instructed to complete the task as they would under realistic usage conditions, balancing speed and accuracy. As a result, participants corrected most, but not all errors. The experimenter did not provide help during the experimental sessions. Participants were allowed to take a break before beginning each task. These breaks averaged approximately 30 s. Participant interactions were videotaped to allow for a detailed analysis of the results. After completing all six tasks, participants completed two additional questionnaires. The first investigated participant perceptions of the technique they used (i.e. Jot or Graffiti). The second gathered basic demographic information. 6 Results Participant expectations regarding the activities that would be supported by the two devices suggest that these two devices were viewed similarly (p(29)=1.53, n.s.). Interestingly, 75% of our participants chose to place the PDA on a table, rather than holding it in their hand, while completing their tasks. To account for variable word lengths among tasks, the 5-character word counts from the last column of Table 2 were used throughout the data analysis. The time required to complete each task was determined through a detailed analysis of the resulting video. Since tasks were distributed one at a time, it was possible to determine the exact time each participant started and completed each task. Given 30 frames per second, a frame-by-frame analysis allowed for sub-second timing accuracy. The time to complete each task was converted into a data entry rate (corrected wpm). As would be expected during realistic usage, our participants corrected most, but not all errors. Uncorrected error rates were computed using a modified version of the technique described by Soukoreff and MacKenzie (2001). This technique counts the minimum number of edits (i.e. insertions, deletions, and substitutions) required convert the produced text to the desired text (see Appendix A). This value is then normalized by dividing by the length of the produced text or desired text, whichever is longer. This technique was modified to address the additional editing capabilities made available through stylus-based interactions. More specifically, the substitution primitive was modified to allow multiple characters to be replaced by a single character (i.e. select a sequence of incorrect characters, enter the desired character). Periods of Difficulty were also identified, and durations recorded, during the frame-by-frame video analysis. Participants were allowed to move freely while completing the tasks. Consequently, nine participants positioned the device such that we were unable to view the tasks in enough detail to identify PoD. Five other participants positioned the device such that we were unable to view a subset of their tasks sufficiently to identify PoD. For each task, we were unable to view between 9 and 12 attempts. We were able to view 65% of the tasks in sufficient detail that we could determine whether or not any PoD occurred. The PoD results reported below are based upon an analysis of this subset of the tasks. To facilitate this analysis, we divide characters into two sets: basic alphanumeric characters (i.e. letters, numbers, space, period, and return) and symbols (i.e. all other characters required to complete the tasks). 6.1 Data entry rates (corrected wpm) Means and standard deviations for data entry rates (measured in corrected wpm) are reported in Table 3. A one-way analysis of covariance (ANCOVA) with repeated measures for task was used to assess the effect of input technique on task completion times. The location of the device during the task (i.e. held in hand or placed on the desk) was used as a covariate. Significant main effects were found for both technique (F(1,28)=11.42, p<0.005) and task (F(5,140)=3.08, p<0.02). A significant interaction between technique and task was also found (F(5,140)=4.29, p<0.002). The location of the device did not have a significant effect on data entry rates (F(1,28)=0.57, n.s.). Overall, Jot allowed users to complete these tasks more quickly. The interaction indicates that the benefits of Jot varied from one task to another. Table 3 Means (in wpm) and standard deviations (in parentheses) for data entry rates for each of six tasks with both Jot and Graffiti. Bold entries indicate significant differences . Task . . 1 . 2 . 3 . 4 . 5 . 6 . Jot 5.10(2.14) 7.91(2.61) 7.32(2.52) 7.35(3.51) 8.79(2.85) 7.74(2.55) Graffiti 4.30(1.11) 5.01(2.20) 3.81(1.40) 4.99(2.04) 6.14(1.89) (1.64) . Task . . 1 . 2 . 3 . 4 . 5 . 6 . Jot 5.10(2.14) 7.91(2.61) 7.32(2.52) 7.35(3.51) 8.79(2.85) 7.74(2.55) Graffiti 4.30(1.11) 5.01(2.20) 3.81(1.40) 4.99(2.04) 6.14(1.89) (1.64) Open in new tab Table 3 Means (in wpm) and standard deviations (in parentheses) for data entry rates for each of six tasks with both Jot and Graffiti. Bold entries indicate significant differences . Task . . 1 . 2 . 3 . 4 . 5 . 6 . Jot 5.10(2.14) 7.91(2.61) 7.32(2.52) 7.35(3.51) 8.79(2.85) 7.74(2.55) Graffiti 4.30(1.11) 5.01(2.20) 3.81(1.40) 4.99(2.04) 6.14(1.89) (1.64) . Task . . 1 . 2 . 3 . 4 . 5 . 6 . Jot 5.10(2.14) 7.91(2.61) 7.32(2.52) 7.35(3.51) 8.79(2.85) 7.74(2.55) Graffiti 4.30(1.11) 5.01(2.20) 3.81(1.40) 4.99(2.04) 6.14(1.89) (1.64) Open in new tab Planned comparisons for the effect of technique on task completion times for individual tasks were also performed. The results indicate that individuals using Jot were significantly faster than individuals using Graffiti for all tasks except task one (see Tables 3 and 4). Table 4 Results of planned statistical evaluations of the effect of technique on task data entry rates for individual tasks Task . Results . 1 F(1,28)=1.72 n.s. 2 F(1,28)=11.12 p<0.005 3 F(1,28)=23.06 p<0.001 4 F(1,28)=5.19 p<0.05 5 F(1,28)=9.89 p<0.01 6 F(1,28)=8.91 p<0.01 Task . Results . 1 F(1,28)=1.72 n.s. 2 F(1,28)=11.12 p<0.005 3 F(1,28)=23.06 p<0.001 4 F(1,28)=5.19 p<0.05 5 F(1,28)=9.89 p<0.01 6 F(1,28)=8.91 p<0.01 Open in new tab Table 4 Results of planned statistical evaluations of the effect of technique on task data entry rates for individual tasks Task . Results . 1 F(1,28)=1.72 n.s. 2 F(1,28)=11.12 p<0.005 3 F(1,28)=23.06 p<0.001 4 F(1,28)=5.19 p<0.05 5 F(1,28)=9.89 p<0.01 6 F(1,28)=8.91 p<0.01 Task . Results . 1 F(1,28)=1.72 n.s. 2 F(1,28)=11.12 p<0.005 3 F(1,28)=23.06 p<0.001 4 F(1,28)=5.19 p<0.05 5 F(1,28)=9.89 p<0.01 6 F(1,28)=8.91 p<0.01 Open in new tab 6.2 Uncorrected error rate Means and standard deviations for uncorrected error rates are reported in Table 5. A one-way analysis of covariance (ANCOVA) with repeated measures for task was used to assess the effect of input technique on uncorrected error rates. The location of the device during the task was used as a covariate. A significant main effect was not found for either technique (F(1,28)=0.26, n.s.) or task (F(5,140)=0.12, n.s.). No significant interaction between technique and task was found (F(5,140)=1.37, n.s.). The location of the device did not have a significant effect (F(1,28)=0.05, n.s.). Planned comparisons for the effect of technique on uncorrected errors for individual tasks were also performed. The results indicate that there were no significant differences (see Table 6). Table 6 Results of planned statistical evaluations of the effect of technique on uncorrected error rates for individual tasks Task . Results . 1 F(1,28)=0.15 n.s. 2 F(1,28)=1.94 n.s. 3 F(1,28)=0.58 n.s 4 F(1,28)=0.92 n.s. 5 F(1,28)=0.01 n.s. 6 F(1,28)=0.70 n.s. Task . Results . 1 F(1,28)=0.15 n.s. 2 F(1,28)=1.94 n.s. 3 F(1,28)=0.58 n.s 4 F(1,28)=0.92 n.s. 5 F(1,28)=0.01 n.s. 6 F(1,28)=0.70 n.s. Open in new tab Table 6 Results of planned statistical evaluations of the effect of technique on uncorrected error rates for individual tasks Task . Results . 1 F(1,28)=0.15 n.s. 2 F(1,28)=1.94 n.s. 3 F(1,28)=0.58 n.s 4 F(1,28)=0.92 n.s. 5 F(1,28)=0.01 n.s. 6 F(1,28)=0.70 n.s. Task . Results . 1 F(1,28)=0.15 n.s. 2 F(1,28)=1.94 n.s. 3 F(1,28)=0.58 n.s 4 F(1,28)=0.92 n.s. 5 F(1,28)=0.01 n.s. 6 F(1,28)=0.70 n.s. Open in new tab Table 5 Means and standard deviations (in parentheses) for uncorrected error rates for each of six tasks with both Jot and Graffiti . Task . . 1 . 2 . 3 . 4 . 5 . 6 . Jot 0.109(0.244) 0.074(0.165) 0.077(0.159) 0.037(0.098) 0.152(0.379) 0.154(0.389) Graffiti 0.171(0.562) 0.260(0.522) 0.132(0.219) 0.139(0.366) 0.128(0.423) 0.060(0.173) . Task . . 1 . 2 . 3 . 4 . 5 . 6 . Jot 0.109(0.244) 0.074(0.165) 0.077(0.159) 0.037(0.098) 0.152(0.379) 0.154(0.389) Graffiti 0.171(0.562) 0.260(0.522) 0.132(0.219) 0.139(0.366) 0.128(0.423) 0.060(0.173) Open in new tab Table 5 Means and standard deviations (in parentheses) for uncorrected error rates for each of six tasks with both Jot and Graffiti . Task . . 1 . 2 . 3 . 4 . 5 . 6 . Jot 0.109(0.244) 0.074(0.165) 0.077(0.159) 0.037(0.098) 0.152(0.379) 0.154(0.389) Graffiti 0.171(0.562) 0.260(0.522) 0.132(0.219) 0.139(0.366) 0.128(0.423) 0.060(0.173) . Task . . 1 . 2 . 3 . 4 . 5 . 6 . Jot 0.109(0.244) 0.074(0.165) 0.077(0.159) 0.037(0.098) 0.152(0.379) 0.154(0.389) Graffiti 0.171(0.562) 0.260(0.522) 0.132(0.219) 0.139(0.366) 0.128(0.423) 0.060(0.173) Open in new tab 6.3 Periods of difficulty Comparisons regarding the likelihood of a PoD were performed using the χ2 test. Nineteen PoD were observed during 61 Graffiti tasks. Twelve PoD were observed during 61 Jot tasks. As discussed above, we were interested in the rate at which PoD occur when users are entering two classes of characters: basic alphanumeric characters and symbols. Symbols were significantly more likely to result in a PoD than basic alphanumeric characters (χ2(1)=401.7, p<0.01). Overall, 12.4% of the attempts to enter symbols resulted in a PoD while only 0.2% of the attempts to enter basic alphanumeric characters resulted in a PoD. A more detailed analysis of where PoD occur when using Jot and Graffiti provides interesting results. Attempts to enter basic alphanumeric characters are more likely to result in a PoD when using Graffiti (0.40%) than when using Jot (0.05%) (χ2(1)=11.6, p<0.01). In contrast, attempts to enter symbols are more likely to result in a PoD when using Jot (18.5%) than when using Graffiti (5.9%) (χ2(1)=3.9, p<0.05). 6.4 Subjective satisfaction After completing all six tasks, participants completed a questionnaire. One participant did not respond to the first question, leaving 30 sets of responses. The first four questions inquired about how easy it was to complete the address (task 1), URL (tasks 2 and 3), appointment (tasks 4 and 5), and email (task 6) tasks. Questions 5–8 investigated perceived speed for completing these tasks. Questions 9–12 investigated the participants perceptions of the number of errors made during the tasks. Questions 13–15 investigated perceived comfort, feelings of being physically tired, and interest in using the device in the future respectively. Reliability was assessed using Cronbach's alpha. The resulting value of α=0.94 indicates that the questionnaire is highly reliable and substantiates the robustness of these results. Means and standard deviations are reported in Tables 7–10. A one-way analysis of variance (ANOVA) with repeated measures for question was used to assess the effect of input technique on participant responses. Of particular interest were planned comparisons of responses for individual questions. There were few significant differences. Participants using Jot felt entering URLs was easier and faster than participants using Graffiti (F(1,28)=5.8, p<0.05; F(1,28)=4.6, p<0.05, respectively). Participants using Jot were also more interested in using the device again in the future (F(1,28)=8.8, p<0.01). Table 10 Means and standard deviations (in parentheses) for participant responses to questions regarding comfort, feeling physically tired, and interest in using the device again. Bold entries indicate significant differences. Range is 1–5, with 1 being strongly agreeing with the statement . Comfortable while using device to enter text . Felt physically tired entering text with this device . Interested in using this device in the future . Jot 2.33(0.90) 3.07(1.10) 1.53(0.64) Graffiti 2.75(1.34) 2.69(0.95) 2.56(1.09) . Comfortable while using device to enter text . Felt physically tired entering text with this device . Interested in using this device in the future . Jot 2.33(0.90) 3.07(1.10) 1.53(0.64) Graffiti 2.75(1.34) 2.69(0.95) 2.56(1.09) Open in new tab Table 10 Means and standard deviations (in parentheses) for participant responses to questions regarding comfort, feeling physically tired, and interest in using the device again. Bold entries indicate significant differences. Range is 1–5, with 1 being strongly agreeing with the statement . Comfortable while using device to enter text . Felt physically tired entering text with this device . Interested in using this device in the future . Jot 2.33(0.90) 3.07(1.10) 1.53(0.64) Graffiti 2.75(1.34) 2.69(0.95) 2.56(1.09) . Comfortable while using device to enter text . Felt physically tired entering text with this device . Interested in using this device in the future . Jot 2.33(0.90) 3.07(1.10) 1.53(0.64) Graffiti 2.75(1.34) 2.69(0.95) 2.56(1.09) Open in new tab Table 9 Means and standard deviations (in parentheses) for participant responses to questions regarding their perception of the number of errors made while completing the tasks. Range is 1–5, with 1 being the most positive response . Perceived number of errors while entering… . . Address . URL . Email response . Appointment . Jot 2.33(0.90) 1.60(0.63) 1.93(0.59) 2.80(1.32) Graffiti 2.75(1.24) 2.31(1.30) 2.38(1.20) 2.56(1.32) . Perceived number of errors while entering… . . Address . URL . Email response . Appointment . Jot 2.33(0.90) 1.60(0.63) 1.93(0.59) 2.80(1.32) Graffiti 2.75(1.24) 2.31(1.30) 2.38(1.20) 2.56(1.32) Open in new tab Table 9 Means and standard deviations (in parentheses) for participant responses to questions regarding their perception of the number of errors made while completing the tasks. Range is 1–5, with 1 being the most positive response . Perceived number of errors while entering… . . Address . URL . Email response . Appointment . Jot 2.33(0.90) 1.60(0.63) 1.93(0.59) 2.80(1.32) Graffiti 2.75(1.24) 2.31(1.30) 2.38(1.20) 2.56(1.32) . Perceived number of errors while entering… . . Address . URL . Email response . Appointment . Jot 2.33(0.90) 1.60(0.63) 1.93(0.59) 2.80(1.32) Graffiti 2.75(1.24) 2.31(1.30) 2.38(1.20) 2.56(1.32) Open in new tab Table 8 Means and standard deviations (in parentheses) for participant responses to questions regarding perceived speed of completing the tasks. Bold entries indicate significant differences. Range is 1–5, with 1 being the most positive response . Perceived speed of entering… . . Address . URL . Email response . Appointment . Jot 2.60(0.99) 1.53(0.64) 2.07(1.03) 3.27(1.28) Graffiti 3.19(1.11) 2.31(1.14) 2.19(1.22) 3.19(1.17) . Perceived speed of entering… . . Address . URL . Email response . Appointment . Jot 2.60(0.99) 1.53(0.64) 2.07(1.03) 3.27(1.28) Graffiti 3.19(1.11) 2.31(1.14) 2.19(1.22) 3.19(1.17) Open in new tab Table 8 Means and standard deviations (in parentheses) for participant responses to questions regarding perceived speed of completing the tasks. Bold entries indicate significant differences. Range is 1–5, with 1 being the most positive response . Perceived speed of entering… . . Address . URL . Email response . Appointment . Jot 2.60(0.99) 1.53(0.64) 2.07(1.03) 3.27(1.28) Graffiti 3.19(1.11) 2.31(1.14) 2.19(1.22) 3.19(1.17) . Perceived speed of entering… . . Address . URL . Email response . Appointment . Jot 2.60(0.99) 1.53(0.64) 2.07(1.03) 3.27(1.28) Graffiti 3.19(1.11) 2.31(1.14) 2.19(1.22) 3.19(1.17) Open in new tab Table 7 Means and standard deviations (in parentheses) for participant responses to questions regarding the ease of completing the tasks. Bold entries indicate significant differences. Range is 1–5, with 1 being the most positive response . Ease of entering… . . Address . URL . Email response . Appointment . Jot 2.53(0.92) 1.53(0.64) 3.00(1.25) 1.93(0.92) Graffiti 3.06(1.18) 2.38(1.09) 2.93(1.19) 2.38(1.15) . Ease of entering… . . Address . URL . Email response . Appointment . Jot 2.53(0.92) 1.53(0.64) 3.00(1.25) 1.93(0.92) Graffiti 3.06(1.18) 2.38(1.09) 2.93(1.19) 2.38(1.15) Open in new tab Table 7 Means and standard deviations (in parentheses) for participant responses to questions regarding the ease of completing the tasks. Bold entries indicate significant differences. Range is 1–5, with 1 being the most positive response . Ease of entering… . . Address . URL . Email response . Appointment . Jot 2.53(0.92) 1.53(0.64) 3.00(1.25) 1.93(0.92) Graffiti 3.06(1.18) 2.38(1.09) 2.93(1.19) 2.38(1.15) . Ease of entering… . . Address . URL . Email response . Appointment . Jot 2.53(0.92) 1.53(0.64) 3.00(1.25) 1.93(0.92) Graffiti 3.06(1.18) 2.38(1.09) 2.93(1.19) 2.38(1.15) Open in new tab 7 Discussion The participants decision to hold the device in their hand, or place it on the desk, while completing the tasks did not have a significant effect on data entry rates or the number of uncorrected errors. Overall, individuals using Jot completed the tasks used in this experiment faster than individuals using Graffiti with no corresponding increase in uncorrected errors. The limited differences in satisfaction also favored Jot while PoD results were mixed. These results are discussed in more detail below. 7.1 Data entry rates While Jot did allow for more rapid data entry, under the conditions studied here both Jot and Graffiti resulted in relatively slow data entry. Graffiti users averaged only 4.95 wpm, with a maximum speed of 6.14 wpm for task 5. Jot users faired better, averaging 7.34 wpm, with a maximum of 8.73 wpm on task 5. Similarly, the fastest individual Graffiti user averaged only 7.65 wpm for the six tasks while the fastest individual Jot user averaged 11.62 wpm for the six tasks. Each of these results indicate a 40–50% increase in data entry rates when using Jot as compared to Graffiti. Comparing our results with those reported elsewhere raises serious concerns. The studies summarized earlier often reported data entry rates ranging of 14–18 wpm, but even our fastest participant never reached these speeds. We believe that the different experimental methodologies and tasks used in these earlier studies may explain these discrepancies. The earlier studies did limit practice, but did not use PDAs and focused on tasks that may not accurately represent the activities PDA users will engage in. More importantly these earlier studies had users ignore errors and employed restricted alphabets. This can be viewed as a speed-accuracy trade off, with earlier studies focusing on speed and the current study placing an increased emphasis on accuracy. We believe that our results of 4–9 wpm accurately represent performance for new users after limited practice when using current state-of-the-art technologies. In contrast, the 14–18 wpm reported elsewhere may better represent a theoretical optimal level of performance—assuming 100% accurate gesture recognition and error-free user performance. Finally, the 24 wpm reported by Lewis (1999) may represent an appropriate target for expert performance. 7.2 Periods of difficulty PoD highlight significant problems novices experience that could be addressed as the technology evolves—either by redefining the required strokes or adding new alternatives for problematic characters. Overall, symbols are more problematic than basic alphanumeric characters with both Jot and Graffiti. This is not surprising since the most natural strokes are assigned to more frequent characters (e.g. numbers and letters). While PoD are not particularly common when entering basic alphanumeric data, they are eight times more common with Graffiti than Jot. In contrast, PoD are quite common with both Jot and Graffiti when entering symbols, with Jot resulting in more than three times as many PoD as Graffiti. 7.3 Revisiting inherent accuracy The PoD results for symbols raise important questions regarding the use of inherent accuracy to predict novice user performance. Jot received higher inherent accuracy scores for symbols, yet attempts to enter symbols using Jot resulted in significantly more PoD. This suggests that a new approach for computing inherent accuracy may be useful. For example, scores could be based not only on the final appearance of the stroke, but the way (e.g. direction) the stroke is entered. Weighting scores based on how frequently each character is utilized may also prove useful (MacKenzie and Zhang, 1997). 7.4 The importance of individual characters While users entered a total of 50 unique letters, numbers and symbols during the six tasks, seven characters accounted for all PoD with Graffiti while six characters accounted for all PoD with Jot. Jot users had difficulty entering commas, open- and close-parentheses, ‘B’, ‘@’, and ‘h’. While each of these indicates a potential problem, 67% of the attempts to enter a comma resulted in a PoD—suggesting serious difficulties with the stroke required for this character. Graffiti users had difficulty with ‘/’, ‘w’, ‘v’, ‘u’, ‘d’, ‘p’, and ‘e’. Difficulties with ‘e’ are noteworthy since this is the single most common letter in written English (Pratt, 1939). Three other examples are also noteworthy. First, over 18% of the attempts to enter the letter ‘v’ resulted in a PoD. This is not surprising given an earlier observation that only 36% of the attempts to enter a ‘v’ were successful using Graffiti (MacKenzie and Zhang, 1997). Second, over 27% of the attempts to enter a ‘/’ resulted in a PoD. Participants entered punctuation mode without any difficulty, but repeatedly entered the subsequent stroke in the wrong direction (down from right to left instead of up from left to right). Finally, the lowercase ‘w’ that began the URLs introduced an interesting, unexpected, and frequent problem for Graffiti users. Normally, to get a single uppercase letter, the user moves into ‘shift’ mode using a single vertical stroke, a second vertical stroke changes the system into ‘cap-lock’ mode, and a third returns the system to the ‘normal’ mode where lowercase letters can be entered. However, when creating a new entry (e.g. memo, appointment, to-do item) on the Palm, the system automatically starts in shift mode so the first character (if it is alphabetic) is capitalized. Unfortunately, this is a special shift mode that does not operate, as users would expect. Instead, the first vertical stroke that should change the system from shift- to cap-lock mode is ignored. As a result, to move from this shift mode to normal mode requires three vertical strokes instead of the two users expect. This created confusion for many of our participants. 7.5 Eliminating PoD With practice, it is likely that PoD would occur less often and may disappear entirely for certain characters. At the same time, PoD can play an important role in the acceptance of the technology—especially if users experience difficulty with important characters. For instance, if the system is used for browsing the WWW, potential users may evaluate the technology by entering a URL. If they experience problems entering a ‘/’, which is often the case for new Graffiti users, this could affect their acceptance of the system. We suggest that by identifying PoD, we highlight opportunities to improve the technology. This may involve redefining the required strokes or adding new alternatives for problematic characters. For instance, redefining the stroke required by Graffiti to enter a ‘/’ to be down from right to left (as it is in Jot) would eliminate much of the confusion regarding this character. Similarly, the stroke used for commas in Jot resulted in numerous PoD, while the corresponding stroke in Graffiti did not result in any significant difficulties. Combining the Graffiti-style comma with the Jot-style ‘/’ may improve usability, but the recognition accuracy of the system would need to be evaluated given the similarity of these two strokes. Similarly, Graffiti provides an alternative for ‘v’ that can reduce the difficulty users experience with this letter, but this alternative is not presented as part of the default reference material provided with Graffiti. The problems Graffiti users had moving between the shift, cap-lock, and normal modes when entering the first character highlight a different kind of problem. Here, users understood and could enter the required strokes, but the system reacts in a different way when entering the first character creating confusion. Modifying the system to always move between these modes in the same way would eliminate this confusion and reduce the PoD rate for new Graffiti users. 8 Conclusions Overall, novices are more efficient using Jot. For five out of six tasks, Jot users entered text more quickly without any corresponding increase in error rates. The only task where Jot did not result in significantly faster data entry was task one, where users entered a complete address. For this task, the larger number of symbols combined with the significantly higher symbol-related PoD rate for Jot eliminated the advantage Jot users experienced with the other tasks. While Jot was faster, our participants failed to enter text as quickly as can be achieved using a stylus-activated keyboard (Lewis, 1999). Our data entry rates were also slower than those reported elsewhere for gesture-based systems (McQueen et al., 1994, 1995; Chang and MacKenzie, 1994; MacKenzie et al., 1994a,b). As discussed earlier, we believe our results are representative of the data entry rates that can be achieved when novices complete realistic tasks under realistic conditions using state-of-the-art technology. In contrast, by employing restricted alphabets and having participants ignore errors, earlier studies may be more representative of a theoretical optimal level of performance that may be possible given 100% recognition accuracy and error-free user performance. PoD highlight opportunities to improve the technology as well as events that may negatively impact initial impressions when potential users evaluate a new system. While some PoD will disappear with experience, many highlight opportunities to make simple changes that could enhance the initial usability of the technology. Additional research is needed to better understand the relationship among PoD, data entry rates, and practice. PoD will become less frequent with practice, but how much practice is necessary before PoD become less common? Are there changes that can be made to reduce the initial rate of PoD without sacrificing long-term usability and performance? How fast can more experienced users enter various kinds of data using Jot and Graffiti? How much practice is required before users reach a speed they consider acceptable? Can Jot or Graffiti compete with other technologies for entering data on mobile, hand-held, computing devices? Additional studies that focus on the relationship between inherent accuracy data entry rates for both novices and more experienced users could also prove interesting. Acknowledgements We would like to thank Wendy Castleman for comments that assisted us in improving this paper. This research was funded by Motorola. The authors gratefully acknowledge their generous support which made this research possible. Appendix A Experimental tasks Task one: Enter this address: John Doe 8374 Maple Dr. Apt. 36-C Baltimore, MD 21250 +1-410-391-4398 jdoe@gl.umbc.edu Task two: Enter this URL: www.giraffe837.com Task three: Enter this URL: www.travelocity.com/vaca23 Task four: Enter this appointment description: Department Meeting Task five: Enter this appointment description: Meeting with Bob and Sue about annual budget Task six: Enter this reply to an email: The meeting this Tuesday has been changed to 2 pm. Please notify me if there is a conflict in your schedule. Bring all materials regarding Alpha project with you to this meeting. I will send more details later in the week. Appendix B Graffiti and Jot Gestures Gestures provided on the standard quick reference guide for Graffiti including those used to write letters, numbers, and 32 symbols available using the punctuation shift mechanism in Graffiti. While alternative gestures are available for some characters, these are not described on the quick reference guide and are therefore not considered in the context of this study. A space is generated by entering the stroke for a ‘–’ without entering symbol/punctuation mode (Fig. A1). Fig. A1 Open in new tabDownload slide Fig. A1 Open in new tabDownload slide Gestures provided on the standard quick reference guide for Jot including those used to write letters, numbers, and the symbols corresponding to the 32 symbols available using the punctuation shift mechanism in Graffiti. A space is generated by entering the stroke for a dash without entering symbol/punctuation mode (Fig. A2). Fig. A2 Open in new tabDownload slide Fig. A2 Open in new tabDownload slide References Chang and MacKenzie, 1994 Chang, L., MacKenzie, I.S., 1994. A comparison of two handwriting recognizers for pen-based computers. Proceedings of CASCON’94, Toronto, IBM Canada, pp. 364–371. Frankish et al., 1995 Frankish C Hull R Morgan P , Recognition accuracy and user acceptance of pen interfaces Proceedings of CHI’95 1995 ACM Press , New York pp. 503–510 OpenURL Placeholder Text WorldCat Goldberg and Richardson, 1993 Goldberg D Richardson C , Touch-typing with a stylus Proceedings of InterCHI’93 1993 ACM Press , New York pp. 80–87 OpenURL Placeholder Text WorldCat LaLomia, 1994 LaLomia M.J , User acceptance of handwritten recognition accuracy Conference Companion CHI’94 1994 ACM Press , New York pp. 107 OpenURL Placeholder Text WorldCat Lewis, 1999 Lewis, J.R., 1999. Input rates and user preference for three small-screen input methods: standard keyboard, predictive keyboard, and handwriting. Proceedings of the Human Factors and Ergonomics Society 43rd Annual Meeting, pp. 425–428. MacKenzie and Chang, 1999 MacKenzie I.S Chang L , A performance comparison of two handwriting recognizers , Interacting with Computers 11 ( 1999 ) 283 297 Google Scholar Crossref Search ADS WorldCat MacKenzie and Zhang, 1997 MacKenzie I.S Zhang S.X , The immediate usability of Graffiti Proceedings of Graphics Interface’97 1997 Canadian Information Processing Society , Toronto pp. 129–137 OpenURL Placeholder Text WorldCat MacKenzie et al., 1994a MacKenzie I.S Nonnecke B McQueen C Riddersma S Meltz M , A comparison of three methods of character entry on pen-based computers Proceedings of the Human Factors and Ergonomics Society 38th Annual Meeting 1994 Human Factors Society , Santa Monica, CA pp. 330–334 OpenURL Placeholder Text WorldCat MacKenzie et al., 1994b MacKenzie I.S Nonnecke B Riddersma S McQueen C Meltz M , Alphanumeric entry on pen-based computers , International Journal of Human-Computer Studies 41 ( 1994 ) 775 792 Google Scholar Crossref Search ADS WorldCat McQueen et al., 1994 McQueen C MacKenzie I.S Nonnecke B Riddersma S Meltz M , A comparison of four methods of numeric entry on pen-based computers Proceedings of Graphics Interface’94 1994 Canadian Information Processing Society , Toronto pp. 75–82 OpenURL Placeholder Text WorldCat McQueen et al., 1995 McQueen C MacKenzie I.S Zhang S.X , An extended study of numeric entry on pen-based computers Proceedings of Graphics Interface’95 1995 Canadian Information Processing Society , Toronto pp. 215–222 OpenURL Placeholder Text WorldCat Pratt, 1939 Pratt F , Secret and Urgent: The Story of Codes and Ciphers 1939 Blue Ribbon Books Soukoreff and MacKenzie, 2001 Soukoreff W.R MacKenzie I.S , Measuring errors in text entry tasks: an application of the Levenshtein String Distance statistic Extended Abstracts of CHI 2001 2001 ACM Press , New York pp. 319–320 OpenURL Placeholder Text WorldCat Tappert et al., 1990 Tappert C.C Suen C.Y Wakahara T , The state of the art in on-line handwriting recognition , IEEE Transactions on Pattern Analysis and Machine Intelligence 12 ( 8 ) 1990 ) 787 808 Google Scholar Crossref Search ADS WorldCat © 2002 Published by Elsevier Science B.V. TI - Data entry for mobile devices: an empirical comparison of novice performance with Jot and Graffiti JF - Interacting with Computers DO - 10.1016/S0953-5438(01)00060-1 DA - 2002-10-01 UR - https://www.deepdyve.com/lp/oxford-university-press/data-entry-for-mobile-devices-an-empirical-comparison-of-novice-o2A2P5Aq0X SP - 413 EP - 433 VL - 14 IS - 5 DP - DeepDyve ER -