Abstract Dictionaries sometimes include pictorial illustrations to complement verbal explanation. The present study examines the question of how verbal and pictorial elements within an entry compete for dictionary users’ attention, and how this competition affects meaning extraction and retention. The study employs eye-tracking technology to analyse the gaze patterns of Polish learners of English consulting illustrated monolingual English learner dictionary entries. In addition, success in identifying and remembering meaning based on illustrated items is assessed and related to eye-tracking data. 1. Introduction According to Stein (1991: 99), dictionary illustrations and their relationship to the definition are fascinating topics that, for a long time, had been all but ignored in lexicographic studies. Though this was perhaps too harsh a judgment — even at the time, given Ilson (1987) and Hupka (1989) — it is undoubtedly true that we have yet much to learn about the role of pictorial illustrations in dictionaries. In a recent authoritative handbook of lexicography, lexicographic illustration is defined by Klosa (2015: 516) as follows: ‘An illustration is a particular kind of image which is used in conjunction with a text and which decorates, illustrates, or explains the text.’ Of the three verbs featuring in this definition, decorating — an aesthetic function — is of secondary importance in a dictionary viewed as a chiefly utilitarian object (though decoration may be of value insofar as it contributes to the symbolic function of some dictionaries). Saying that an illustration illustrates is uselessly tautological. Thus, it is the third verb, explaining the text, that seems to express the most central function of lexicographic illustration.1 The primacy of explanation is also confirmed by Klosa further down the page when she writes: ‘At best, there is a complementary relationship between the definition of a headword and the illustration, so that the whole meaning can be ascertained from the definition and the illustration, with the illustration completing the text and vice versa.’ (Klosa 2015: 516). The latter quotation brings into focus another important aspect of lexicographic illustration: its complementary relationship to textual elements, most of all definition. However, Kemmer (2014b) voiced a concern that pictorial illustration, by virtue of its appeal to a more biologically fundamental mode of perception and the resulting ‘force of attraction’, might monopolize the attention of dictionary users, leaving the verbal components unused, and thus useless. Kemmer’s empirical findings alleviate her concerns, suggesting that pictorials do not after all marginalize the traditional definition. Kemmer dealt with native speakers of German and, in her eye-tracking study, used two rather technical illustrated entries (see below for more details). In the present contribution, we revisit the issue in the context of non-native dictionary users (Polish learners of English), rather than native speakers, consulting general-language items. The type of illustrative pictorials dealt with here are one-off illustrations found under individual dictionary entries: what Hupka (2003: 363) and Svensén (2009: 303) called single illustrations. Their primary role in dictionary entries is to support text comprehension, while text production and directed vocabulary learning may benefit from scenic (Svensén 2009: 310; Klosa 2015: 519) or synoptic (Luna 2013) illustrations, which present a set of thematically related objects in a given context, such as bathroom fixtures and accessories, or vegetables in a shopping display. We shall not be concerned with the latter type of illustration here. For a finer-grained breakdown of the types of pictorials in dictionary entries, see Kemmer (2014a) or Hupka (2003). There is some evidence that illustrations in entries may improve both the comprehension of lexicographic explanation (Nesi 1998) and recall of meaning (Gumkowska 2008). Neither of these studies, however, was able to determine the extent to which participants actually viewed the illustrations. This only became a possibility with the introduction of the eye-tracking technique. Eye-tracking research is capable of producing a detailed, relatively faithful and accurate record of where a dictionary user looks, for how long, and in what sequence. The approach rests on the eye-mind assumption (Just and Carpenter 1980), which states that gaze fixations reflect the cognitive effort of processing a stimulus. With fairly good approximation, it can be assumed that the gaze reflects meaningful processing of the objects viewed. While the technique has been used quite extensively in reading research, its use in lexicography is still at best sporadic, and very recent. Three early studies were done on digital Danish dictionaries (Simonsen 2009a, 2009b, 2011), and two looked at Japanese learners of English (Kaneta 2011; Tono 2011). Summaries of these studies may be found in Lew et al. (2013), which also reports on an original eye-tracking study of Polish learners of English consulting bilingual dictionary entries. The most recent and most relevant work is Kemmer (2014a, 2014b), as she focused specifically on the interplay of definitions and pictorial illustrations in dictionary entries. Her expectation was that pictorials would attract the gaze of dictionary users to the extent that they would only turn to definitions after they have viewed the picture; or perhaps they would even ignore definitions altogether. Kemmer saw this as a potential problem of pictures in entries, since not all desirable aspects of meaning can typically be conveyed in this way, and there may be important clues in the definition that face the danger of being missed. In her study, Kemmer presented 38 native speakers of German with two dictionary entries for semi-specialized terms: das Schneckengetriebe ‘worm gear’ and der Pfahlstich ‘bowline knot’. The elicitation of the response was through multiple-choice options. In addition, each of the two entries was presented with two different prompts: a general one, asking participants to use the entry to learn what the object was; and a more focused instruction, specifically mentioning the constitution, appearance, and function of the object defined in the entry. The study found that in neither case were definitions ignored by the participants, who would always spend more viewing definitions that looking at pictures. However, Kemmer expressed a valid concern that two stimuli (each presented with two prompts) were hardly a sufficient sample to generalize about image versus text preferences in consultation behaviour, as it is hard to say to what extent the patterns observed reflect the idiosyncrasies of the specific items used and whether it can be generalized to other entries. Therefore, in this study we use a distinctly larger number of stimuli (17). We now turn to the description of the present study. 2. The study 2.1. Aims The general aim of this study was to follow up on Kemmer’s pioneering work, exploring how dictionary users’ attention is divided between definition and illustration, as evidenced by gaze patterns. We wanted to investigate this using a larger number of entries taken from an authoritative monolingual dictionary for learners of English, with learners of English as participants. Compared to Kemmer’s study, the number of items was increased to 17. We also wanted to see whether the type of illustration made a difference, in the sense of the objects being depicted in a typical context, or without such a context. Crucially, the viewing of entries is not an end in itself in receptive dictionary consultation: learning the meaning is, in both senses of the word: (1) discovering the meaning from the entry when the word is consulted; and (2) storing this information for later use, so that the (erstwhile) dictionary user might proceed without lexical consultation on the next encounter. We addressed both these outcomes in the study. The specific research questions were as follows: Is there a typical viewing pattern for illustrated entries? Are definitions neglected at the cost of pictorial illustrations? What do participants tend to look at first: illustration or definition? Is this choice related to type of illustration? To what extent do participants switch between pictures and definitions, and is there much variation in the switching? How does entry viewing time (dwell time) vary by item and by participant? Are dwell times for entry, definition, and picture related to type of illustration? What is accuracy of equivalent provision related to? Is meaning retention facilitated by attentive processing of illustrations in entries? 2.2. Participants Participants were all first-year undergraduates at a large public Polish university, all majoring in English. Their English proficiency was upper-intermediate or B2 according to CEFR (Council of Europe 2011). Ten participants took part in the study, but data from one participant were of low technical quality and were discarded; another participant was familiar with too many of the items, so in the end data from eight participants were used. 2.3. Materials 2.3.1 Test items. Thirty illustrated entries were initially selected as candidate items for the study from LDOCE Online. The candidate items were chosen from the complete set of all available illustrated entries in the dictionary that met the following conditions: (1) they were concrete nouns; (2) they were relatively infrequent; (3) they were likely to be unknown to participants at this level; (4) the pictorial illustration was clear and appeared to communicate the meaning well. In addition, half of the items (N = 15) had illustrations that depicted the defined objects in isolation (see Figure 1), whereas pictorials for the other half (N = 15) presented the objects in context (structural illustrations as per Svensén 2009: 305): for example, the picture for the noun banister represented the object in the natural context of a staircase (see Figure 2). In assessing the frequency of candidate items, reference was made to the COCA and BNC corpora. The initial thirty candidate items were subsequently rated for their familiarity and illustrability. Figure 1 View largeDownload slide Sample stimulus for an entry with pictorial illustration presenting the object defined in isolation Figure 1 View largeDownload slide Sample stimulus for an entry with pictorial illustration presenting the object defined in isolation Figure 2 View largeDownload slide Sample stimulus for an entry with pictorial illustration presenting the object defined in context, with Areas of Interest (AoI’s) marked for entry text, definition, and illustration Figure 2 View largeDownload slide Sample stimulus for an entry with pictorial illustration presenting the object defined in context, with Areas of Interest (AoI’s) marked for entry text, definition, and illustration To eliminate items that might be too familiar, the level of familiarity of the thirty candidate items was assessed with a pen-and-paper Familiarity Assessment Task (FAT, see Supplementary Data), using a group of twenty students in the same study programme and level as the study participants. Participants in the FAT were asked to rate their level of knowledge of the candidate items and provide a Polish equivalent or description of the meaning. Based on the responses, items were given familiarity scores. Immediately after the FAT, the same group of students rated the illustrability of candidate items via a pen-and-paper Illustration Assessment Task (IAT, see Supplementary Data). The thirty candidate items were presented each with a Polish equivalent and pictorial illustration from its relevant LDOCE Online entry. Students were asked (in Polish) to rate how helpful the accompanying illustrations were in conveying the meaning of the English words, on a six-point semantic differential scale going from 1 (completely unhelpful) to 6 (very helpful). Items with the lowest mean IAT scores (below 3.4) were discarded, as were those that, in the FAT, turned out to be familiar to more than 3 out of 20 students, leaving 18 items for the eye-tracking study. However, due to a technical error, one of the items did not yield usable data, thus leaving the final list of usable items at 17. This number included eight items with illustrations presenting objects in isolation (blowtorch, colander, decanter, forceps, hinge, plunger, sundae, and tankard), and nine with in-contexts illustrations (bookend, banister, draughtboard, gazebo, mitt, nib, pestle, playpen, stopper). Item stimuli (see Figure 1 and Figure 2) reproduced single illustrated senses of the source LDOCE Online entries, retaining the original layout but in 175 percent magnification relative to the default size. Any remaining senses were ignored. The stimuli were all normalized to a size of 840 x 353 pixels and centred on the screen. For each stimulus, three Areas of Interest (AoI’s) were defined, marking out the basic structural elements of an entry (see Figure 2): verbal definition, pictorial illustration, and the textual part of the entry, including the definition and the headword with its grammar label. The predefined AoI’s were later used to compute gaze times and fixation counts within relevant regions of each stimulus. An Item Recall Test (IRT, see Supplementary Data) consisted of a printed list of target headwords with spaces for Polish equivalents. There were four versions of IRT, each with a different randomized order of items. 2.4. Apparatus The eye-tracker used in the experiment was Tobii model T60, with no head restraint. In this model, the tracking cameras are integrated into the monitor and thus are not obvious to participants, which helps to ensure a relatively naturalistic setting of the experiment. The software used to design the study and collect the eye movement data was Tobii Studio, version 2.0.8; a newer non-recording version 3.4.5 was used for further data processing. We used bilateral eye-tracking (both eyes were tracked). 2.5. Procedure The data collection took place between January and March, 2015. In a series of individual sessions, participants were seated in a spacious daylit office, in front of a desk, at a distance of 60 centimetres from the unit. They were instructed to sit comfortably, leaning against the backrest and keeping one hand on the desk and the other on the space bar to self-pace the advance of the stimuli. After the procedure was explained and clarified when needed, a trial run was executed, which included a five-point calibration procedure, followed by an instructions screen and then seven items similar to those used in the actual experiment (some of the unused items from the initial selection). Participants were then asked if they understood their task and whether they had any questions. Once all was clear, they proceeded to the main experimental session. The main experimental session began with a five-point calibration followed by 20 screens: the first was again the instruction, the next eighteen screens contained the stimuli, in randomized order, and a closing screen. Participants were asked to say aloud a Polish equivalent of each item and then proceed to the next stimulus by pressing the space bar. After completing the eye-tracking session, each participant was given an unannounced Item Recall Test, in which they were asked to supply in writing a Polish equivalent for each of the items used in the study. Before leaving the office, each participant would receive a debriefing sheet: a paper copy of all the headwords, illustrations and their Polish translations. 2.6. Data analysis Overall, there were 17 usable stimuli, each viewed by 8 participants, yielding a total of 136 individual recordings with synchronized eye-tracking and audio channel data. All these recordings were inspected manually multiple times, noting any patterns, regularities, or irregularities. Audio recordings were transcribed and rated for equivalent accuracy (correct or incorrect), independently by two experimenters. Semantically correct glosses were also accepted, even if they were not set lexical units in Polish. Likewise, recall responses were keyed into a spreadsheet and rated in the same way as for equivalent accuracy, except that incorrect equivalents that were nevertheless successfully retained were counted as correct, since recall was intended as a measure of the retention of meaning (correct or not). Interrater agreement was 96 percent for equivalent provision and 98 percent for recall, and the isolated differences were resolved in a joint session. In computing fixations, a velocity-based I-VT fixation classifier (Komogortsev et al. 2010) was used with the settings as described below. Co-ordinates from the two eyes were averaged, with interpolation enabled for missing tracking data (blinking and other occasional tracking interruptions) for a maximum gap of 75 ms. No noise reduction was used (it would be problematic given the relatively slow sampling rate of 60 samples per second), 20 ms window length with a threshold of 30 degrees per second for velocity calculations. Close adjacent fixations were merged if spaced apart by no more than 0.5 degrees within 75 ms. Minimum fixation duration was set at 60 ms: this fairly low threshold was motivated by unpublished experimental data obtained by the first author which suggested that skilled dictionary users, in viewing entries, may produce legitimate gaze fixations that are distinctly shorter than those found in ordinary text reading. Raw fixation data with accompanying AOI information were exported as a structured text file and subsequently processed in R (R Core Team 2015), also using dplyr (Wickham and Francois 2015) to restructure the data, compute new measures and aggregate measures across recordings, items and participants. Mixed effects models were fitted using lme4 (Bates et al. 2015), also drawing on afex (Singmann et al. 2015). For plots, apart from the base graphics functionality in R, we used sciplot (Morales 2012) and lsr (Navarro 2015). Classification tree analysis was also done but will not be reported on here. 3. Results and discussion 3.1. Typical gaze pattern In the majority of cases, gaze patterns were fairly similar and predictable. A participant would often begin with a glance at the headword, then cast a brief look at the illustration, and then read the definition. At this point, a subject either supplied an equivalent and proceeded to the next stimulus, or went on to examine the illustration again, and only then gave an answer. A typical gaze pattern is illustrated in Figure 3 (participant C, entry playpen). In a minority of cases, however, more involved gaze patterns were noted, with many fixations and switching back and forth between the definition and illustration, as for the same participant viewing the entry for bookend in Figure 4. Figure 3 View largeDownload slide A typical gaze pattern (item playpen, participant C) Figure 3 View largeDownload slide A typical gaze pattern (item playpen, participant C) Figure 4 View largeDownload slide A gaze pattern exhibiting a large number of fixations and switches between definition and picture (item bookend, participant C) Figure 4 View largeDownload slide A gaze pattern exhibiting a large number of fixations and switches between definition and picture (item bookend, participant C) 3.2. Were definitions neglected in the presence of pictorial illustrations? A fundamental question, also posed by (Kemmer 2014b), was whether pictorial information would attract the attention of participants to the point where they would completely ignore the verbal definition. In our study, there were only five cases across the 136 recordings, where the participant would not look at the definition at all, and all these five cases came from the same participant A. This was also the only participant whose total dwell time on pictures exceeded dwell time on definition (57 percent of total dwell time, the mean for the other participants being 37 percent). Clearly, this one participant exhibited an unusual preference for pictures. In all other cases, participants always looked at both the definition and the illustration. Which of these elements attracted their attention first, however? 3.3. Were pictures or definitions viewed first? Our data indicate that the choice of whether to examine the picture or definition first is largely a matter of personal preference: a specific entry-viewing strategy. Of the eight participants, three looked at pictures first, two exhibited a more balanced choice, with a slight preference towards pictures, and three clearly preferred definitions (see Table 1). Beyond these personal strategies, however, it can be noted that for three specific items (blowtorch, bookend, and playpen), all but one participants went for the definition first. The remaining five items were balanced in this regard, with roughly half of the participants starting with the picture, and the other half with the definition. Overall, there was more variation due to participant (sdP = 5.3) than item (sdI = 1.4, F(7,16) = 3.70, p = 0.015). Table 1 Choice of picture (P) or definition (D) as the first entry element examined (by item and by participant) Participant Item A B C D E F G H Picture first (by item) Definition first (by item) banister P D P D P P P D 5 3 blowtorch D D D D P D D D 1 7 bookend D D D D P D D D 1 7 colander P D P D P P P D 5 3 decanter P D P D P D D P 4 4 draughtboard P P P D P P D D 5 3 forceps D D P P P P D D 4 4 gazebo P P P D P P D D 5 3 hinge P D P D P P P D 5 3 mitt P D P D P P P D 5 3 nib P D P D P P P D 5 3 pestle P D P D P P D D 4 4 playpen D D P D D D D D 1 7 plunger D D P D D P P P 4 4 stopper P D P D P P P D 5 3 sundae D D P D P P P D 4 4 tankard D D P D D P P P 4 4 Picture first (by participant) 10 2 15 1 14 13 9 3 67 Definition first (by participant) 7 15 2 16 3 4 8 14 69 Participant Item A B C D E F G H Picture first (by item) Definition first (by item) banister P D P D P P P D 5 3 blowtorch D D D D P D D D 1 7 bookend D D D D P D D D 1 7 colander P D P D P P P D 5 3 decanter P D P D P D D P 4 4 draughtboard P P P D P P D D 5 3 forceps D D P P P P D D 4 4 gazebo P P P D P P D D 5 3 hinge P D P D P P P D 5 3 mitt P D P D P P P D 5 3 nib P D P D P P P D 5 3 pestle P D P D P P D D 4 4 playpen D D P D D D D D 1 7 plunger D D P D D P P P 4 4 stopper P D P D P P P D 5 3 sundae D D P D P P P D 4 4 tankard D D P D D P P P 4 4 Picture first (by participant) 10 2 15 1 14 13 9 3 67 Definition first (by participant) 7 15 2 16 3 4 8 14 69 3.4. Did the choice of first element depend on the presence of context in the picture? As some of our pictures presented the object defined in a natural context, this fact may have had an influence on whether this picture was viewed first, or else the definition. A contingency table for the choices listed in Table 1 is given in Table 2. It is clear that the observed frequencies give no trace of evidence that the presence or absence of the context in the picture is related to it being chosen or not as the first entry element examined (the odds ratio OR = 1.06 is very close to 1, and even an extremely narrow 10 % confidence interval still includes a value of 1: CI10 = (0.96, 1.18)). Table 2 Choice of definition or picture as the first element examined, cross-tabulated by whether the picture included context; the frequencies are almost perfectly balanced, suggesting that the choice was independent of the presence of context. Context First element Absent Present Definition 33 36 Picture 31 36 Context First element Absent Present Definition 33 36 Picture 31 36 The frequencies in Table 2 are not strictly speaking entirely independent, as they are contributed by eight participants for 17 items, but such a violation of data independence raises the risk of a falsely significant result (as underlying patterns surface in the numbers), whereas in our case we have a result that is very far from significance. To be on the safe side, however, a binary logistic model with a logit link function was fitted on the choice of first element (picture or definition), with picture context as binary predictor, random intercept for item, and random intercept and slope for participant. The effect of context was very far from significant, so there is no evidence that picture context played a role in participants’ initial choices. The next question is whether high illustrability of an item was a factor that makes it more likely for pictures to be initially examined. 3.5. Did the choice of first element viewed depend on illustrability? Items vary in terms of how readily they can be represented in a pictorial illustration. One might expect that for highly illustrable items participants would be more inclined to examine pictorial information first, and this did indeed seem to be the case. Figure 5 presents the distribution of illustrability scores categorized by whether a picture (top plot) or definition (bottom plot) was viewed as the first entry element. The notches on the two box plots do not overlap, which suggests that illustrability of the items is not independent of the participants’ choices of the first element to be viewed. Figure 5 View largeDownload slide Box plots (with individual data points shown) of item illustrability scores for all recordings, where picture (top plot) or definition (bottom plot) was examined first Figure 5 View largeDownload slide Box plots (with individual data points shown) of item illustrability scores for all recordings, where picture (top plot) or definition (bottom plot) was examined first To confirm that this is the case, two binary logistic regression models were fitted with the choice of first element as a binary response variable (indicating whether the first element examined was a picture or a definition). The null model was intercept-only, while the alternative model included illustrability as a continuous predictor. The richer model which included the illustrability term had a lower Akaike Information Criterion (AIC0 = 188.37), compared with the intercept-only model (AICI = 190.51). An analysis of deviance indicated a significant reduction of residual deviance when illustrability was added to the model (DI = 184.37, D0 = 188.51, ΔD = 4.14, Δdf = 1, p = 0.04, using the χ2 distribution). The conclusion then is that illustrability played a role in the selection of first entry element to examine, with more illustrable items attracting more first views to the picture element. The mean illustrability score was 4.71 (sd = 0.63) when picture was the first element viewed, as against 4.48 (sd = 0.69) when definition was the first element examined. 3.6. Switching between pictures and definitions There was also variation in the amount of switching between pictures and definitions that participants engaged in, with a median number of switches per consultation being 4. Detailed data are reported in Table 3. Again, there is clearly more variation due to participant (sdP = 48.9) than due to item (sdI = 13.0, F(7,16) = 3.76, p = 0.013), yet on further investigation it turned out that the rate of switching is a simple — and very faithful — reflection of the degree of attention given to the entries by the individual participants, as measured by total fixation count (rPearson = 0.96, p < 0.001). This nearly perfect correlation indicates that participants switch more between the two entry components when they spend more time consulting the entry in general, and in fact there is no independent significance to the switching behaviour beyond the degree of attention directed at an entry. Table 3 Transitions between pictures and definitions (by item and by participant) Participant Item A B C D E F G H Total by item banister 0 1 2 3 7 8 5 5 31 blowtorch 2 4 14 6 8 18 6 6 64 bookend 1 4 9 2 8 13 6 4 47 colander 2 2 2 2 2 2 3 5 20 decanter 2 2 2 6 4 5 3 6 30 draughtboard 2 3 4 2 9 6 6 4 36 forceps 4 6 2 1 6 4 1 2 26 gazebo 0 1 4 2 1 3 2 0 13 hinge 0 4 8 2 6 22 5 1 48 mitt 0 4 6 2 7 4 5 4 32 nib 3 2 4 2 6 18 1 6 42 pestle 0 2 4 2 7 28 2 1 46 playpen 3 4 5 2 4 8 4 12 42 plunger 0 12 2 0 4 28 11 3 60 stopper 2 2 2 2 4 8 3 5 28 sundae 0 2 6 2 6 6 8 3 33 tankard 1 2 2 2 3 8 5 5 28 Total by participant 22 57 78 40 92 189 76 72 626 Participant Item A B C D E F G H Total by item banister 0 1 2 3 7 8 5 5 31 blowtorch 2 4 14 6 8 18 6 6 64 bookend 1 4 9 2 8 13 6 4 47 colander 2 2 2 2 2 2 3 5 20 decanter 2 2 2 6 4 5 3 6 30 draughtboard 2 3 4 2 9 6 6 4 36 forceps 4 6 2 1 6 4 1 2 26 gazebo 0 1 4 2 1 3 2 0 13 hinge 0 4 8 2 6 22 5 1 48 mitt 0 4 6 2 7 4 5 4 32 nib 3 2 4 2 6 18 1 6 42 pestle 0 2 4 2 7 28 2 1 46 playpen 3 4 5 2 4 8 4 12 42 plunger 0 12 2 0 4 28 11 3 60 stopper 2 2 2 2 4 8 3 5 28 sundae 0 2 6 2 6 6 8 3 33 tankard 1 2 2 2 3 8 5 5 28 Total by participant 22 57 78 40 92 189 76 72 626 3.7. Dwell times on entry elements 3.7.1. Total entry dwell time. Total entry dwell time (i.e. the time spent examining any part of an entry) turns out to be very highly correlated with total fixation count (rPearson = 0.94, p < 0.001, CI95 = [0.92, 0.96]), so these two measures are largely interchangeable, at least in the context of the present study. Of the two, dwell time is preferred as, it being a continuous measure, it holds more information than a discrete count of fixations which ignores their duration. An average participant spent close to 9 seconds examining an average entry, of which 3.5 seconds (40 percent) fell within the pictorial illustration. Time spent examining specific entries varied quite a great deal by both item and participant. Clearly, some items attracted attention for much longer than others, at least for some participants. Figure 6 gives the distribution of entry dwell times for the individual items (on a log scale). It is given in the form of a box plot, where each thick line signifies the median value, the bottom and top of each light-grey box represents the lower and upper quartile, respectively (so that the box covers half the data points), and the whiskers extend 1.5 times the interquartile distance below and above the lower/upper quartile, respectively. Certain items (e.g. blowtorch and plunger) exhibit broader ranges of dwell times, indicating that some participants took much longer with them than others. Other entries, such as mitt, stopper or tankard, are characterized by low variability across participants. Figure 6 View largeDownload slide Box plot of mean entry dwell time for the individual items Figure 6 View largeDownload slide Box plot of mean entry dwell time for the individual items A different story is told by a box plot of entry dwell times for the individual participants (see Figure 7, also on a log scale). This plot reflects the participants’ individual strategies in examining the entries. And so, participants A, C, G, and (especially) H were consistently fast. In contrast, participants E and F were slower on average, and took more time with some of the items. Figure 7 View largeDownload slide Box plot of mean entry dwell times for the individual participants Figure 7 View largeDownload slide Box plot of mean entry dwell times for the individual participants Insofar as dwell time reflects visual attention, it was interesting to know if this measure was related to illustrability scores or the presence of pictorial context. To address this question, we fitted a linear mixed model (using lmer), regressing total entry dwell time on illustrability scores and presence of context, including in the model random intercepts for participants and items, which allows the isolation of fixed effects of interest, correcting for the random variability due to participant and item (on the benefits of using this approach see Ptasznik and Lew 2014). The interaction between illustrability and context was not significant and was therefore removed from the model. Neither of the main effects (illustrability or context) was significant. These two fixed effects are plotted in Figure 8, with model-predicted values and a 95% confidence band (illustrability) and 95% confidence intervals (context). There was a tendency for low-illustrability items to attract longer entry dwell time, likewise for context-absent illustrations, though neither of these tendencies was significant after correcting for the effects of item and participant. Using a natural logarithm of entry dwell time as the response variable did not produce qualitatively different results. Figure 8 View largeDownload slide Main effects of illustrability (left) and presence of context (right) on total entry dwell time (with 95% confidence intervals and confidence band) Figure 8 View largeDownload slide Main effects of illustrability (left) and presence of context (right) on total entry dwell time (with 95% confidence intervals and confidence band) 3.7.2 Definition and picture dwell times. No significant effects or interactions were found for dwell times on definition, or dwell times on picture, using as input variables illustrability scores and presence of context, with random intercepts for participants and items. However, absolute time spent examining illustrations was to a large extent a reflection of the total entry dwell time (rPearson = 0.88, p < 0.001). To correct for this and isolate a measure of relative attention given to pictorial illustration, in the next step we examined relative dwell time on pictorial illustrations, expressed as the ratio of time spent on pictures and total entry dwell time. 3.7.3 Picture relative dwell time. The question of interest was whether participants’ relative attention devoted to pictorial illustrations in the entries would be affected by their illustrability and/or the presence of context in the pictorial illustration. To address this question, a linear mixed model was fitted, regressing relative picture dwell time (i.e., the fraction of time spent examining pictures relative to total entry dwell time) on illustrability scores and presence of context, with random intercepts for participants and items. The interaction between illustrability and context was removed from the model after it was found non-significant. Neither of the main effects in the re-fitted model turned out to be significant. The two fixed effects are illustrated in Figure 9. The relative time spent examining pictorial elements of the entry falls slightly with increasing illustrability, and it is about a quarter higher for the richer context-present illustrations. Removing illustrability from the model did not result in a significantly worse fit. Removing context produced a significantly higher residual deviance (χ2 = 4.3, df = 1, p = 0.4, using a likelihood ratio test) compared to a null model with random intercepts for participants and items. The Bayesian and Akaike Information Criteria point in the opposite directions, though, making a definite conclusion difficult in this case. Tentatively, we might conclude that the presence of context in the illustration attracted relatively more attention to the pictorial illustration. Comparing the right-hand panels of Figure 8 and Figure 9, we may say that participants did not spend more time examining entries with context-present pictures in absolute terms (the reverse tendency is apparent), but a higher (30%) portion of the total time was spent examining in-context pictures. Figure 9 View largeDownload slide Main effects of illustrability (left) and context (right) on picture relative dwell time (with 95% confidence intervals and confidence band) Figure 9 View largeDownload slide Main effects of illustrability (left) and context (right) on picture relative dwell time (with 95% confidence intervals and confidence band) 3.8. Equivalent provision and recall 3.8.1 Equivalent provision and recall by item. An average item would receive a correct equivalent in 76 percent of the cases, and it would be recalled correctly in 41 percent of the cases. Unsurprisingly, there was substantial variation between items in terms of equivalent provision as well as equivalent recall (see Figure 10). Some items turned out to be easy to name in Polish based on the entries (e.g. banister, gazebo, mitt, plunger). Others (blowtorch, nib, pestle) turned out to be more difficult. In terms of participants being able to recall the equivalent they had supplied, items that were easy to remember included blowtorch, bookend, sundae and plunger, while the least remembered items were colander, decanter, pestle, and tankard. It was not always the case that items easy to name were necessarily easy to remember (compare banister and plunger). In two cases (blowtorch and bookend) the recall rate was higher than the initial equivalent provision. Figure 10 View largeDownload slide Accuracy rates for equivalent provision and equivalent recall rate by item Figure 10 View largeDownload slide Accuracy rates for equivalent provision and equivalent recall rate by item 3.8.2 Provision and recall by participant. Naturally, there was also variation in the accuracy of equivalent provision and equivalent recall rate between the participants (see Figure 11). Participant C was best when it comes to equivalent provision, while A had the highest recall rate. Figure 11 View largeDownload slide Relative accuracy of equivalent provision and equivalent recall rate by participant Figure 11 View largeDownload slide Relative accuracy of equivalent provision and equivalent recall rate by participant 3.9. What is equivalent provision accuracy related to? In order to find out what measures equivalent provision was related to, a series of generalized linear mixed models, assuming a binomial distribution of residuals and using a logit link function, was fitted using glmm for accuracy of equivalent provision, with entry dwell time, context, illustrability and relative dwell time as fixed effects, and random intercepts for participants and items. No significant fixed effects or interactions were found. Next, equivalent provision accuracy rates were computed for each item, and these values were regressed on entry dwell time and presence of context. Presence of context was not significant, but the interaction term of dwell time and presence of context in the regression was significant (F(1,13) = 5.5, p = 0.03). Dwell time was a highly significant simple term on its own (F(1,13) = 17, p < 0.01). To illustrate these effects, Figure 12 plots equivalent provision accuracy rates for all items, classified by presence of illustration context, with three regression lines: (1) for all 17 items (solid line); (2) for items with out-of-context illustrations only (dashed line, triangular data points); and (3) for items with in-context illustrations (dotted line, round data points). For all items irrespective of context, higher accuracy of equivalent provision was associated with shorter dwell time for in-context illustrations (b = – 0.044, F(1,13) = 12, p = 0.03, rPearson = – 0.67, Multiple R2 = 0.45), but the regression slope was not significantly different from zero for items with out-of-context illustrations (b = 0.038, F(1,6) = 2.0, p = n.s., rPearson = – 0.50, Multiple R2 = 0.25). By contrast, the slope was highly significantly different from zero for items with in-context illustrations, resulting in a remarkably good fit (b = – 0.07, F(1,7) = 29, p < 0.01, rPearson =– 0.90, Multiple R2 = 0.81). Each extra second taken examining the entry with in-context illustrations corresponded to a lowering of the accuracy rate by seven percentage points. Figure 12 View largeDownload slide Plot of equivalent provision accuracy by mean entry dwell time with three regression lines: (1) for all items (solid line); (2) for items with out-of-context illustrations only (dashed line, triangular data points); and (3) for items with in-context illustrations (dotted line, round data points). Higher accuracy of equivalent provision was associated with shorter dwell time for in-context illustrations. Figure 12 View largeDownload slide Plot of equivalent provision accuracy by mean entry dwell time with three regression lines: (1) for all items (solid line); (2) for items with out-of-context illustrations only (dashed line, triangular data points); and (3) for items with in-context illustrations (dotted line, round data points). Higher accuracy of equivalent provision was associated with shorter dwell time for in-context illustrations. 3.10 Equivalent recall A series of generalized linear mixed models, assuming a binomial distribution of residuals and using a logit link function, was fitted using glmm, for accuracy of equivalent recall, with entry dwell time, context, illustrability, and fraction of dwell time on pictorial illustrations as fixed effects, and random intercepts for participants and items. No significant fixed effects or interactions were found. Recall rates were then calculated for each item, and a simple fixed-effect linear model was fitted for item recall rates, with per-item entry dwell time as predictor. In this model, the entry dwell time effect (i.e. the regression slope) turned out to be marginally significant (b = 0.038, F(1,15) = 4.20, p = 0.058, rPearson = 0.47, Multiple R2 = 0.22). According to this model, the recall rate would increase by 0.038 for each extra second of entry dwell time (see Figure 13, solid line). This marginally significant effect would be consonant with the expectation that time spent on an item (sometimes called elaboration) should strengthen its memory trace. Presence of context was then added to the regression model, yielding separate intercepts and slopes for the two types of pictorial illustrations: out-of-context and in-context. The resulting regression lines were overlaid in Figure 13 as a dashed and dotted line, respectively, with contributing items represented with triangles and dots, respectively. For items with illustrations in context, the slope was virtually zero (b = – 0.001, F(1,7) = 0.03, p = n.s. rPearson = 0.02, Multiple R2 = 0.0005). The dotted line is almost perfectly horizontal, suggesting that dwell time did not make any difference to recall for items illustrated in context. By contrast, the set of items illustrated with context-free illustrations (dotted line and items marked as dots in Figure 13) exhibited a significant slope (b = 0.07, F(1,14) = 7.8, p = 0.03, rPearson = 0.75, Multiple R2 = 0.57). For these items, longer examination time was related to improved recall. Figure 13 View largeDownload slide Plot of recall rate by mean entry dwell time with three linear regression lines: (1) for all items (solid line); (2) for items with out-of-context illustrations only (dashed line, triangular data points); and (3) for items with in-context illustrations (dotted line, round data points). Longer entry dwell time improves meaning recall for items with out-of-context illustrations, but it has virtually no effect in the case of in-present illustrations. Figure 13 View largeDownload slide Plot of recall rate by mean entry dwell time with three linear regression lines: (1) for all items (solid line); (2) for items with out-of-context illustrations only (dashed line, triangular data points); and (3) for items with in-context illustrations (dotted line, round data points). Longer entry dwell time improves meaning recall for items with out-of-context illustrations, but it has virtually no effect in the case of in-present illustrations. It is intriguing why elaboration would have a strong effect on entries with illustrations without context, but none at all in the case of entries drawn in their typical context. We also know that participants tended to take up a larger portion of their time examining in-context pictures compared to out-of-context illustrations. Perhaps participants got distracted by the elements of the context (such as a human hand wearing a mitt, or a staircase supporting a banister) and effectively wasted their time examining these elements without strengthening their memory trace for the target item itself. Visual inspection of heat maps and visual opacity plots for the in-context items did not produce a clear answer. Figure 14 presents a gaze opacity plot for the item nib, which is a visualization of cumulative gaze time by all participants. On the one hand, there is evidence of some gaze in the picture outside the strict area representing the nib, but on the other hand it is also true that most of the gaze was focused around the nib proper. Another possible explanation of this intriguing effect might lie in the nature of those lexical items: the fact that they were illustrated in context may reflect the inherent difficulty of representing them unambiguously: perhaps without the physical context they might go unrecognized or be mistaken for another object, and it may be that this difficulty also hinders retention of meaning. Figure 14 View largeDownload slide Gaze opacity plot for the item nib, all participants Figure 14 View largeDownload slide Gaze opacity plot for the item nib, all participants 4. Summary of the findings and conclusion 4.1. Summary of the findings In conclusion, we will summarize the findings concisely, organizing them by the detailed research questions. 4.1.1 Is there a typical viewing pattern for illustrated entries? A typical consultation took 9 seconds, about 40 percent of which was spent examining the picture. Normally, a participant would look both at the picture and definition, switching between them a small number of times (four on average). More problematic items took longer, with more fixations and transitions. 4.1.2 Are definitions neglected at the cost of pictorial illustrations? Overall, there is no evidence that the presence of pictorial illustrations leads to the neglect of the verbal definition, a finding consonant with (Kemmer 2014b), but based on a larger sample of items and a different category of dictionary users: language learners using L2 dictionaries. Rather, verbal and pictorial information seem to complement each other. This is evidenced by the non-trivial amount of switching between the two types of information, and also by the fact that the participants appear to have varied in the amount of relative attention given to the two components. 4.1.3 What do participants tend to look at first: illustration or definition? Is this choice related to type of illustration? There was a surprising degree of balance in whether illustration or definition was examined first, and the little variation that did exist, seems to have reflected individual participants’ preferred strategies. Certain items, however, appeared to attract the first gaze to the definition (or away from the picture?) for a distinct majority of participants. The choice of first element to view was independent of whether the item was presented in context or out of context. However, there did seem to be a preference for picture as a first element viewed for items with higher illustrability scores. 4.1.4 To what extent do participants switch between pictures and definitions, and is there much variation in the switching? There were, on average, four switches between pictures and illustrations, though some items exhibited a much higher number, and one participant switched very little. However, it turns out that the number of switches reflected very closely the total fixation count, so there is probably no independent significance to the switching behaviour beyond general attention directed at an entry. This is a finding that might be of methodological relevance for future studies of similar phenomena. 4.1.5 How does entry viewing time (dwell time) vary by item and by participant? Are dwell times for entry, definition, and picture related to type of illustration? There was some variation in entry dwell time by item, which likely reflected differences in the difficulty of arriving at an equivalent. Differences between participants were more pronounced, and identified individual entry consultation strategies. Low-illustrability items or items presented with out-of-context pictures tended to be viewed longer, although this tendency was not significant. No significant effects were found for absolute dwell times on the definition or picture elements. However, the fraction of time spent examining pictures, which reflects relative attention to pictures, was 30 percent higher for items presented with in-context pictures. 4.1.6 What is accuracy of equivalent provision related to? Correct equivalents were supplied for 76 percent of the cases, and varied by item. Higher equivalent provision rates were related to shorter dwell times, though this effect was only significant for entries with illustrations of objects in context. 4.1.7 Is meaning retention facilitated by attentive processing of illustrations in entries? Mean rate of recall was 41 percent: unsurprisingly lower than equivalent provision accuracy. It was not always the case that items for which equivalents were provided with high accuracy were also remembered well on the post-test. Longer entry dwell time was positively related to high recall rates (the reverse of what was found for equivalent provision), though only for items with out-of-context illustrations. In-context items exhibited no relationship between recall rates and entry dwell times. 4.2. Problems and prospects To our knowledge, no findings are available that would allow us to set a standard with which to compare the durations of time spent examining dictionary definitions and pictorial illustrations accompanying the entries. Paivio’s dual-coding theory (e.g. Paivio 1990) might suggest that the cognitive processing of pictorial information should take longer than in the case of text. On the other hand, dictionary definitions pose a particular challenge to dictionary users, especially language learners, and thus are likely to be more difficult to process than individual items (Paivio’s focus), but also regular narrative text. Further, information in the types of pictorials we have used tends to be fairly spatially concentrated, which would make them potentially easier to process visually than the more spatially distributed definition. In view of all these complications, it would be risky to claim that equal periods of time spent examining definition and pictures are equivalent in terms of cognitive processing. Perhaps it is the case that extracting information from a pictorial illustration is more efficient than is the case for a verbal definition, or perhaps it is the other way round: at this time, we know of no way to ascertain this, so any judgements on the relative importance of pictures and definitions ought to be tentative, though differences between entries or participants should still be meaningful. An intriguing finding of this study was that longer time spent looking at entries was only associated with better meaning retention for items illustrated out-of-context, especially as the association here was very strong, and the lack of analogous effect for in-context illustrations striking. We speculated that this might be due to either participants wasting their time looking at irrelevant elements of context, or else the inherent nature of items that called for in-context illustration. One way of distinguishing between these two explanations would be to add illustration context to the out-of-context items and compare the performance of the original and manipulated items. In future studies, it might be instructive to introduce entries with no pictorial illustration so as to assess how the presence of illustration affects the consultation time. There is evidence that pictures help in conveying (Nesi 1998) and storing (Gumkowska 2008) the meaning of foreign-language items, but is there cost involved in the form of longer entry consultation time? Notes Footnotes 1 We will take illustration to mean pictorial illustration in the context of the present article, while acknowledging the occasional use of illustration to refer to lexicographic examples. Acknowledgements This work was supported in part by the Polish National Science Centre (Narodowe Centrum Nauki), under grant DEC-2013/09/B/HS2/01125 to Robert Lew. Ewa Tomczak is the recipient of the 2016/2017 scholarship for doctoral students from the Adam Mickiewicz University Foundation. References Longman Dictionary of Contemporary English Online. 2016. http://www.ldoceonline.com (LDOCE Online) Bates D., Mächler M., Bolker B., Walker S.. 2015. ‘ Fitting Linear Mixed-Effects Models Using lme4.’ Journal of Statistical Software 67. 1: 1– 48. Google Scholar CrossRef Search ADS Council of Europe. 2011. Common European Framework of Reference for Languages: Learning, Teaching, Assessment. Gumkowska A. 2008. The Role of Dictionary Illustrations in the Acquisition of Concrete Nouns by Primary School Learners and College Students of English . BA Thesis, Collegium Balticum. Hupka W. 1989. Wort und Bild. Die Illustrationen in Wörterbüchern und Enzyklopädien . (Lexicographica. Series Maior 22). Tübingen: Niemeyer. Google Scholar CrossRef Search ADS Hupka W. 2003. ‘ How Pictorial Illustrations Interact with Verbal Information in the Dictionary Entry: A Case Study’ In Hartmann R. R. K. (ed.), Lexicography. Critical Concepts . London: Routledge, 363– 390. Ilson R. F. 1987. ‘ Illustrations in Dictionaries’ In Cowie A. P. (ed.), The Dictionary and the Language Learner . (Lexicographica Series Maior 17). Tübingen: Niemeyer, 193–212. Google Scholar CrossRef Search ADS Just M. A., Carpenter P. A.. 1980. ‘ A Theory of Reading: From Eye Fixations to Comprehension.’ Psychological Review 87. 4: 329– 354. Google Scholar CrossRef Search ADS PubMed Kaneta T. 2011. ‘ Folded or Unfolded: Eye-Tracking Analysis of L2 Learners’ Reference Behavior with Different Types of Dictionary’ In Akasu K., Uchida S. (eds), Asialex2011 Proceedings. Lexicography: Theoretical and Practical Perspectives . Kyoto: Asian Association for Lexicography, 219– 224. Kemmer K. 2014a. Illustrationen im Onlinewörterbuch. Text-Bild-Relationen im Wörterbuch und ihre Empirische Untersuchung . Mannheim: Amades. Kemmer K. 2014b. ‘ Rezeption der Illustration, jedoch Vernachlässigung der Paraphrase? Ergebnisse einer Benutzerbefragung und Blickbewegungsstudie’ In Müller-Spitzer C. (ed.), Using Online Dictionaries . (Lexicographica Series Maior 145.). Berlin: Walter de Gruyter, 251–278. Google Scholar CrossRef Search ADS Klosa A. 2015. ‘ Illustrations in Dictionaries; Encyclopaedic and Cultural Information in Dictionaries’ In Durkin P. (ed.), The Oxford Handbook of Lexicography . Oxford: Oxford University Press, 515– 531. Komogortsev O. V., Gobert D. V., Jayarathna S., Koh D. H., Gowda S. M.. 2010. ‘ Standardization of Automated Analyses of Oculomotor Fixation and Saccadic Behaviors.’ IEEE Transactions on Biomedical Engineering 57. 11: 2635– 2645. Google Scholar CrossRef Search ADS Lew R., Grzelak M., Leszkowicz M.. 2013. ‘ How Dictionary Users Choose Senses in Bilingual Dictionary Entries: An Eye-Tracking Study.’ Lexikos 23: 228– 254. Google Scholar CrossRef Search ADS Luna P. 2013. ‘ Picture This: How Illustrations Define Dictionaries’ In Luna P., Kindel E. (eds), Typography Papers 9 . London: Hyphen Press. Morales M. 2012. Sciplot: Scientific Graphing Functions for Factorial Designs (R Package Version 1.1‐0). https://CRAN.R-project.org/package=sciplot. Navarro D. 2015. Learning Statistics with R: A Tutorial for Psychology Students and Other Beginners. (Version 0.5). Adelaide, Australia: University of Adelaide. http://ua.edu.au/ccs/teaching/lsr Nesi H. 1998. ‘ Defining a Shoehorn: The Success of Learners’ Dictionary Entries for Concrete Nouns’ In Atkins B. T. S. (ed.), Using Dictionaries. Studies of Dictionary Use by Language Learners and Translators . (Lexicographica Series Maior 88). Tübingen: Niemeyer, 159–178. Google Scholar CrossRef Search ADS Paivio A. 1990. Mental Representations: A Dual Coding Approach . Oxford: Oxford University Press. Google Scholar CrossRef Search ADS Ptasznik B., Lew R.. 2014. ‘ Do Menus Provide Added Value to Signposts in Print Monolingual Dictionary Entries? An Application of Linear Mixed-Effects Modelling in Dictionary User Research.’ International Journal of Lexicography 27. 3: 241– 258. Google Scholar CrossRef Search ADS R Core Team. 2015. R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing. https://www.R-project.org. Simonsen H. K. 2009a. ‘ Se – og Du Skal Finde: En Eyetracking-Undersøgelse Med Særlig Fokus på De Leksikografiske Funktioner’ Nordiske Studier i Leksikografi 11. Rapport Fra Konference Om Leksikografi i Norden. Finland 3.-5. Juni 2009 . Tampere: Nordisk forening for leksikografi, 274– 288. Simonsen H. K. 2009b. Vertical or Horizontal? That Is the Question: An Eye-Track Study of Data Presentation in Internet Dictionaries. Eye-to-IT conference on translation processes . Copenhagen Business School, Frederiksberg. Simonsen H. K. 2011. ‘ User Consultation Behaviour in Internet Dictionaries: An Eye-Tracking Study.’ Hermes 46: 75– 101. Singmann H., Bolker B., Westfall J.. 2015. afex: Analysis of Factorial Experiments (R Package Version 0.15‐2). https://CRAN.R-project.org/package=afex. Stein G. 1991. ‘ Illustrations in Dictionaries.’ International Journal of Lexicography 4. 2: 99– 127. Google Scholar CrossRef Search ADS Svensén B. 2009. A Handbook of Lexicography: The Theory and Practice of Dictionary-Making . Cambridge: Cambridge University Press. Tono Y. 2011. ‘ Application of Eye-Tracking in EFL Learners’ Dictionary Look-up Process Research.’ International Journal of Lexicography 24. 1: 124– 153. Google Scholar CrossRef Search ADS Wickham H., Francois R.. 2015. dplyr: A Grammar of Data Manipulation (R Package Version 0.4.3). https://CRAN.R-project.org/package=dplyr. © 2017 Oxford University Press. All rights reserved. For permissions, please email: email@example.com
International Journal of Lexicography – Oxford University Press
Published: Mar 1, 2018
It’s your single place to instantly
discover and read the research
that matters to you.
Enjoy affordable access to
over 18 million articles from more than
15,000 peer-reviewed journals.
All for just $49/month
Query the DeepDyve database, plus search all of PubMed and Google Scholar seamlessly
Save any article or search result from DeepDyve, PubMed, and Google Scholar... all in one place.
All the latest content is available, no embargo periods.
“Whoa! It’s like Spotify but for academic articles.”@Phil_Robichaud