Behav Res (2018) 50:1102–1115 DOI 10.3758/s13428-017-0929-z 1,2 1 1,3 Nicole Eichert & David Peeters & Peter Hagoort Published online: 8 August 2017 The Author(s) 2017. This article is an open access publication . . Abstract Predictive language processing is often studied by Keywords Virtual Reality Prediction Language . . measuring eye movements as participants look at objects on a Comprehension Eyetracking Visual World computer screen while they listen to spoken sentences. This variant of the visual-world paradigm has revealed that infor- mation encountered by a listener at a spoken verb can give rise Prediction is a key feature of human cognition (Friston, 2010), to anticipatory eye movements to a target object, which is and anticipatory behavior is steadily gaining the interest of re- taken to indicate that people predict upcoming words. The searchers from different fields. The notion that we adjust certain ecological validity of such findings remains questionable, actions on the basis of knowledge of upcoming events has been however, because these computer experiments used two- demonstrated in many experimental studies and has inspired dimensional stimuli that were mere abstractions of real- theoretical and computational accounts of predictive world objects. Here we present a visual-world paradigm study information processing. Helmholtz (1860) had already incorpo- in a three-dimensional (3-D) immersive virtual reality envi- rated probabilistic, knowledge-driven inference into his models ronment. Despite significant changes in the stimulus materials of the human sensory systems. More elaborate theoretical and and the different mode of stimulus presentation, language- computational models of predictive processing evolved side by mediated anticipatory eye movements were still observed. side with the experimental evidence, and Clark (2013,p. 1)even These findings thus indicate that people do predict upcoming claimed brains to be Bessentially prediction machines.^ words during language comprehension in a more naturalistic Psycholinguistics is one research area that is strongly concerned setting where natural depth cues are preserved. Moreover, the with different aspects of prediction (see Huettig, 2015). The pre- results confirm the feasibility of using eyetracking in rich and dictive nature of language processing is a matter of ongoing multimodal 3-D virtual environments. debate, and recent studies have aimed to disentangle prediction from related concepts such as preactivation, anticipation, and integration (for a review, see Kuperberg & Jaeger, 2016). A pivotal tool in the study of prediction in psycholinguis- tics has been eyetracking. In a seminal study performed over Electronic supplementary material The online version of this article (doi:10.3758/s13428-017-0929-z) contains supplementary material, 40 years ago, Cooper (1974) investigated the role of eye which is available to authorized users. movements during spoken language comprehension. Participants listened to short stories while looking at a visual * David Peeters display that depicted several objects. Their eye movements email@example.com were recorded, and it was found that at remarkably short la- tencies, eye gaze was directed to those objects that were men- Max Planck Institute for Psycholinguistics, tioned in the spoken sentences or that were associated with the Nijmegen, The Netherlands content of the narrative. These findings led to the conclusion University of Oxford, Oxford, UK that eyetracking is a useful tool Bfor real-time investigation of perceptual and cognitive processes and, in particular, for the Donders Institute for Brain, Cognition, and Behavior, Radboud University, Nijmegen, The Netherlands detailed study of speech perception, memory and language Behav Res (2018) 50:1102–1115 1103 processing^ (Cooper, 1974, p. 84). Further psycholinguistic suggested that sentence processing happens in an incremental studies elaborated on the paradigm introduced by Cooper, and piecewise manner (e.g., Tanenhaus et al., 1995) and stressed which was later termed the visual-world paradigm (VWP; the fact that upcoming words are actively predicted by the pro- see Huettig, Rommers, & Meyer, 2011; Tanenhaus, Spivey- cessor (Altmann & Kamide, 1999). Knowlton, Eberhard, & Sedivy, 1995). In their experiments, researchers in psychology and cogni- In the typical screen-based Blook-and-listen^ variant of the tive neuroscience often make use of two-dimensional (2-D) VWP, participants are visually presented with line drawings or line drawings and pictures that are mere abstractions of real- pictures of multiple objects on a computer screen. The audi- world objects (e.g., Huettig & McQueen, 2007;Snodgrass & tory input in such studies is often a spoken word or a sentence Vanderwart, 1980). Limiting the complexity of stimuli in such that refers to the visual display in a certain manner defined by a way increases experimental control over the variables of the experiment. An object that is directly mentioned in the interest, but the generalizability of the results to everyday spoken input is commonly referred to as the target object, language processing remains debatable (Henderson & whereas the other objects can be competitors or distractor Ferreira, 2004). By using semirealistic visual scenes and col- objects. The underlying assumption of this variant of the ored clip-art objects, the study outlined above aimed to study VWP is that the auditory input is associated with a shift in sentence processing in relation to Breal-world contexts^ attention that leads to an increased likelihood of fixating the (Altmann & Kamide, 1999, p. 247). It is an open question, target object relative to the other objects. Since the time to however, whether the anticipatory eye movements observed program a saccade can be reliably approximated as 200 ms when testing participants that look at static, artificial images (Matin, Shao, & Boff, 1993), eyetracking is a relatively pre- on a computer screen generalize to everyday situations of cise method to study the timing of language comprehension in sentence processing in typical, naturalistic contexts a visual context. (Henderson & Ferreira, 2004). To address this issue of eco- The use of the VWP has allowed researchers to draw im- logical validity, more recent studies have investigated antici- portant theoretical conclusions regarding the role of prediction patory eye movements as a proxy of prediction in language in the online processing of spoken sentences. An influential processing by using complex, realistic photographs of rich eyetracking study in this domain was reported by Altmann visual scenes as stimulus materials. These studies conceptual- and Kamide (1999), who presented participants with ly replicated the original effect (e.g., Andersson, Ferreira, & semirealistic scenes containing colored images of an agent Henderson, 2011; Coco, Keller, & Malcolm, 2016; Staub, (e.g., a boy), a target object (e.g., a cake), and several Abbott, & Bogartz, 2012). Another study, however, suggested distractors (e.g., a ball, a toy car, and a toy train). While that the ecological validity of the VWP is possibly limited to looking at the visual scenes on a computer screen, participants situations that present a relatively small number of distractor objects (Sorensen & Bailey, 2007). listened to simple spoken sentences that referred to the scene and the target object. Two experimental conditions were In the present study, we focused on a different element that contrasted as a function of the relationship between the verb increases the ecological validity of an experimental visual and the displayed objects. In the restrictive condition, the spo- stimulus—namely, its stereoscopic three-dimensional (3-D) ken sentence contained a verb that constrained the domain of presentation in an immersive 3-D visual context. After all, subsequent reference so that only the target object could se- most objects we encounter in everyday communicative situa- lectively be referred to by the verb (e.g., The boy will eat the tions have at least three spatial dimensions. We adapted the cake, paired with a scene in which the cake is the only edible variant of the VWP developed by Altmann and Kamide object). In the unrestrictive condition, the verb could relate to (1999) to be compatible with a virtual reality (VR) environ- all presented objects (e.g., The boy will move the cake,paired ment. VR environments preserve the stereoscopic depth cues with a scene in which all the depicted objects are moveable). that are inherent in naturalistic vision but have been absent in Theresultingeyemovementpatternsshowedthatparticipants typical screen-based variants of the visual world paradigm. launched saccades to the target object significantly earlier in the The simultaneous exposure to visual stimuli and related audi- restrictive than in the unrestrictive condition. Critically, this in- tory input in VR leads to a more immersive character of the creased probability of looks to the target object was observed represented scene than traditional studies in which participants before the onset of the noun. These results therefore support the simply looked at a small computer monitor. hypothesis that information encountered at the verb gives rise to At a technical level, VR environments make use of various anticipatory eye movements to possible visible referents, which media in order to expose participants to a computer generated indicated that listeners predict upcoming words. Altmann and simulation.Though possibly allsensorymodalitiescanbeinclud- Kamide (1999) indeed concluded that the brain can project an ed,thevisualandauditorydomainsaremostcommonlysubjectto unrealized grammatical object based on verb-mediated knowl- virtual simulation (e.g., Slater, 2014). Stereoscopic vision and the edge in a given visual context (but see Yee & Sedivy, 2006). percept of a 3-D space including depth are elicited by displaying two horizontally displaced images to the left and the right eye. In These findings are furthermore in line with work that has 1104 Behav Res (2018) 50:1102–1115 the present study, we presented the visual input by using projec- make use of the full communicative, interactive, and audiovisual tion screens in a cave automatic virtual environment (CAVE). A potentialofferedbythisnovelmethod,therebyfindingresultsnot CAVE system consists of several projection surfaces that form a in line with the original Altmann and Kamide (1999) claims, it cubic space surrounding the participant (Cruz-Neira, Sandin, & would be unclear whether such a discrepancy weredue tothe VR DeFanti,1993).Participantswearactiveshutterglassesthatcreate method leading to different behavior than traditional methods or a stereoscopic 3-D image by rapidly alternating between to the increase in ecological validity that the VR method affords. displaying and blocking the image intended for the respective If the present study were to conceptually replicate the original eye. The timing of the alternation is coupled to the refresh rate of findings, future studies could build on these results by using the projection screens, so that both devices work synchronously. manipulationsthatcanonlybeimplementedinVR.Thepotential Due to the high alternation frequency a coherently fused image is of VR lies in its increased ecological validity, as compared to perceived. The glasses are furthermore part of a tracking system screen-based studies. Rather than being a passive observer of that monitors the position and direction of the participant’shead, stimuli on a computer screen, participants in a virtual environ- controlling the correct perspective of the visual display. ment themselves become part of the depicted scene. Whereas an Despite its powerful potential of combining experimental increase in ecological validity often results in a decrease in ex- control and ecological validity, the use of VR in psycholin- perimental control, immersive VR has the potential to combine guistics has remained virtually nonexistent. Initial psycholin- the naturalness of everyday interaction with a degree of experi- guistic studies using VR confirm the validity of this novel mental control that is to be desired by the experimental psychol- method by indicating that people speak to virtual interlocutors ogist or cognitive neuroscientist. the way they speak to human interlocutors, and that they pro- Several changes were made to the original paradigm in cess speech produced by virtual agents in a similar way to that order to avoid some confounding factors in the original study produced by human speakers. A study on language production and to make the paradigm compatible with presentation in a 3- in dialogue, for instance, demonstrated that natural linguistic- D virtual environment to Dutch participants. The most signif- priming effects occur when participants interact with a icant difference between the original study and our experiment human-like virtual agent (Heyselaar, Hagoort, & Segaert, was the mode of stimulus presentation. Altmann and Kamide 2017). It has also been found that participants accommodate (1999) had displayed the scenes on a relatively small comput- their speech rate (Staum Casasanto, Jasmin, & Casasanto, er screen (17 in.), and the stimuli were created using a 16-color 2010) and pitch (Gijssels, Staum Casasanto, Jasmin, palette. For the present study, we generated 3-D color objects Hagoort, & Casasanto, 2016) to the speech rate and pitch of rich in detail that were presented in an immersive virtual en- their virtual interlocutors. Recent EEG evidence has suggested vironment that features stereoscopic vision. Unlike in the orig- that similar cognitive and neural mechanisms may underlie inal study, we aimed to keep the context information conveyed by the agent in the visual scenes minimal, because the identity speaking and listening to virtual as well as to human interloc- utors (Peeters & Dijkstra, 2017; Tromp, Peeters, Meyer, & of the agent can cause confounding predictive eye movements Hagoort, in press). These initial findings indicate the feasibil- (Kamide, Altmann, & Haywood, 2003). A craftsman, for ex- ity of using VR as a method to test whether traditional exper- ample, might be strongly associated with a machine-like ob- imental findings may generalize to more naturalistic settings. ject in the scene, regardless of the information extracted from the verb. Therefore we presented virtual agents that were of neutral appearance. Also in contrast to the original study, we The present study made sure that the virtual agent was not looking at one of the objects, and we kept the number, animacy, and positions of The purpose of the present study was twofold. First, we aimed objects per scene and relative to the background scenery con- to conceptually replicate the findings of Altmann and Kamide stant. Furthermore, we controlled the verb materials for sev- (1999) in an immersive 3-D VR environment. This meant eral linguistic features, including word length and frequency, specifically that we tested for verb-mediated anticipatory eye to rule out that these parameters modulated the anticipatory movements to a visually presented target referent in a CAVE effect. The present study was carried out in Dutch and with environment. Second, in doing so, we tested whether it is Dutch verbal materials (see Fig. 1 for an example). Dutch methodologically feasible to combine VR and eyetracking in future tense places the (restrictive or nonrestrictive) target verb the study of online language processing in a multimodal 3-D after the target noun, rendering the use of future tense impos- environment. sible for investigating verb-based predictive processing in this The conceptual replication we tested for in the present study version of the VWP. To assure that listeners would interpret was necessary before follow-up studies could start making use of each sentence as referring to an action that would take place in the unique affordances of immersive VR in the domain of pre- the future, the adverb dadelijk (Bshortly, soon^) was included dictive language processing. If, for instance, a first VR study on in each sentence (cf. Hintz, 2015). Finally, we doubled the number of trials in order to increase statistical power, and predictive language processing in a visual environment were to Behav Res (2018) 50:1102–1115 1105 computer-generated virtual agent sitting in a backyard surrounded by four objects (see Fig. 1 for an example). The virtual agent represented either a female or a male person. Both virtual agents were adapted from a stock avatar produced by WorldViz (2016) and appeared to be Caucasians in their mid-twenties (sportive06_f_highpoly and casual13_m_highpoly). The virtual agents were sitting crossed-legged on the virtual floor, and their facial expression was a modest smile. The gaze of the virtual agent was directed to the virtual floor between the agent and the participant, with- out showing a preference for looking at any of the objects. The virtual background environment showed a simple backyard scenery with brick walls and surrounding trees. The experimental spoken sentences were in Dutch and had a Fig. 1 Example scene used in the experiment. Participants listened to the simple subject–verb–adverb–object structure, such as De man eet sentence De man eet dadelijk de meloen (BThe man will soon eat the dadelijk de meloen (BThe man will soon eat the melon^). All melon^)or De man draagt dadelijk de meloen (BThe man will soon sentences described an action that could apply to the presented carry the melon^) while viewing the scene visual display, and for each scene the two sentences differed only withrespecttotheverb.Dependingonthepresentationofafemale performed a finer-grained evaluation of eye movement pat- or a male virtual agent, the subject of the sentence was changed terns by implementing a logistic regression analysis that accordingly (de man Bthe man^ or de vrouw Bthe woman^). would overcome the problems associated with the use of tra- To contrast two experimental conditions, we generated ditional analyses of variance in analyzing eyetracking data. verb pairs that consisted of a restrictive and an unrestrictive We hypothesized that if the original findings of Altmann verb. A restrictive verb imposed constraints on its arguments and Kamide (1999) generalized to situations of stereoscopic such that only one of the visually presented objects was a vision and immersed language processing, we should find plausible argument. In the following, this object will be re- anticipatory eye movements to the target object before noun ferred to as the target object of the scene. In the example onset for the restrictive condition only. The absence of such an illustrated in Fig. 1, the melon is the target object, because effect would put in question whether the original findings can among the four visible objects, it is the only appropriate argu- generalize to more naturalistic viewing conditions. ment for the verb to eat. The unrestrictive verb in this scene, to carry, does not narrow down the domain of subsequent refer- ence, because all four presented objects can function as plau- Method sible arguments. Sixty-four verbs were selected from the Dutch Lexicon Participants Project 2 database (DLP2; Brysbaert, Stevens, Mandera, & Keuleers, 2016), which contains several lexical measures for Twenty-one native speakers of Dutch (19 female, two male; 30,000 Dutch lemmas, including length and frequency (see all 19–27 years of age, mean age = 21.9) took part in the main measures listed in Appendix 1). The verbs were equally dis- experiment. The data of 30 participants was recorded, but nine tributed over the restrictive and unrestrictive conditions, so were excluded due to the insufficient accuracy or quality of that the linguistic features of the infinitive verb forms in the the eyetracking data. Participants were recruited via the online two conditions were comparable. We controlled for lexical participant database of the Max Planck Institute for characteristics obtained from the database and computed sta- Psycholinguistics. They gave written informed consent prior tistical comparisons in R (R Development Core Team, 2015). to the experiment and were monetarily compensated for their To obtain p values, we first assessed the assumptions for a participation. All participants had normal hearing and vision. Student’s t test by applying a Shapiro–Wilk test and an F test The study was approved by the ethics board of the Social in order to check for normal distribution and equal variances. Sciences Faculty of Radboud University, Nijmegen, A two-tailed t test was performed if the measures met the The Netherlands. statistical assumptions; otherwise, a Wilcoxon rank sum test was used. The measures for the mean and standard deviation Selection of stimulus materials (SD)or—for nonnormal parameters—the median and median absolute deviation are provided in Appendix 1,together with Thirty-two 3-D visual scenes were designed and were each the corresponding p values. The verb pairs are listed in paired with two spoken sentences. The scenes displayed a Appendix 2. 1106 Behav Res (2018) 50:1102–1115 Foreachverbpair,weselectedfournamesofobjectsthatcould which words are related to each other. The relatedness of function as grammatically correct postverbal arguments within words isquantifiedbycalculating thecosinedistanceofthe the sentence (see Appx. 2). The four object names for a given two vector representations within a 200-dimensional space. scene were selected to be of the same grammatical gender, such The toolbox makes use of the Dutch SONAR-500 text corpus that the Dutch definite article (de or het) would correspond to all (Oostdijk, Reynaert, Hoste, & Schuurman, 2013)and acorpus objects within the scene. This ruled out the determiner interfering of Dutch movie subtitles. with the possible prediction formed on the basis of the verb. For We compared the values for the Brestrictive verb + target^ each scene, only one of the four objects was a plausible argument pairs to the mean values of the three Brestrictive verb + for the restrictive verb. This object was the target of the scene, distractor^ pairs, pair-wise for each scene. The semantic dis- whereas the other three objects were distractors. The relation be- tance of the word pairs in the restrictive condition differed tween the restrictive verb and the distractors either was not in significantly (paired Wilcoxon test, n = 64, p < .001). accord with real-world experience or was highly unlikely. The Comparing word pairs from the unrestrictive condition re- unrestrictive verb did not impose any semantic restrictions on vealed no significant difference (paired Wilcoxon test, n = the argument in such a way that the target and distractors could 64, p>.05). The semantic distances are illustrated in Fig. 2. equally likely be referred to as possible arguments of the verb. The semantic relatedness results show that the restrictive verbs Consider, for example, the verb pair eat/carry and the objects were semantically more closely related to the target than to the melon, watering can, chair, and barbell. Since the melon is the distractor objects. For unrestrictive verbs, the semantic dis- onlyedibleobjectamongthefour,itisconsideredthetargetobject. tances in the two object categories did not differ. This pattern All four objects, however, can be regarded as plausible arguments of semantic similarity was the essential characteristic defining for the unrestrictive verb to carry. the two experimental conditions. In the restrictive condition, The object names within one scene started with different only the target object was a plausible argument for the verb, phonemes in order to avoid phonological activation of the implying a closer semantic relationship between verb and ob- distractor objects. Allopenna, Magnuson, and Tanenhaus ject. This result thus confirms that our selected combinations (1998) showed increased fixation probabilities for distractors of verbs, targets, and distractors were suitable for the experi- that began with the same onset and vowel as the target object. mental questions we wanted to assess. Semantic relatedness is another confounding factor that has been shown to influence fixation behavior (Yee & Sedivy, 2006). For the given paradigm, however, it was impossible Sentence recordings and annotation to completely avoid thematic associations between the four objects. Since all objects could serve as the argument for the The sentences were spoken by a female native speaker at a normal rate with neutral intonation. The recording was per- unrestrictive verb, they shared at least one semantic feature formed in a soundproof booth, sampled at 44.1 kHz (stereo, with respect to their relation to the unrestrictive verb. For example, the verb proeven (Bto taste^) is only plausibly related 16-bin sampling resolution), and stored digitally on computer. The audio file was chopped into individual audio files for each to food items, and objects for the verb reinigen (Bto clean^)are commonly associated with objects in the household. sentence using Praat (Boersma & Weenink, 2009), and all files Semantic similarity has been shown to predict language- mediated eye movements in visual-world studies (Huettig, Quinlan, McDonald, & Altmann, 2006). To support our selec- tion of object names, we measured the relatedness of verb– object pairs on the basis of semantic spaces. The semantic distance for the eight possible verb–object combinations with- in each scene were computed using the open-source web tool snaut (http://zipf.ugent.be/snaut-dutch; Mandera, Keuleers, & Brysbaert, 2017). The snaut tool computes a measure of semantic relatedness based on the count of co-occurrences of two lemmas in a large corpus. This principle is adapted from latent semantic analysis (Landauer & Dumais, 1997), which is a method for measuring semantic similarity of texts using corpus-based measures. The algorithm underlying snaut is a Fig. 2 Semantic distances measured with the snaut tool. The data points for the target represent the semantic distance of single-word pairs (e.g., predictive neural network that makes use of a Continuous Bag Brestrictive verb + target^), and the data points for the distractors represent of Words model (CBOW) architecture. Co-occurrences are mean values of the three distractor word pairs in one scene. Pairwise obtained within a window of ten words that slides through comparison of the target and distractor pairs in the restrictive condition the entire corpus. By adjusting weights, the network learns revealed a significant difference (paired Wilcoxon test, n =64, p <.001) Behav Res (2018) 50:1102–1115 1107 were equalized for maximal amplitude. Sentences were anno- tated by placing digital markers at the onsets and offsets of critical words: verb onset, verb offset, determiner onset, and noun onset. The mean duration of the sentences was 3,065 ms (SD = 314), and the positions of the markers did not differ between the sentences of the restrictive and unrestrictive con- Fig. 3 Hypothetical grid on the virtual ground used for the main ditions (see Table 1). experiment. The numbers indicate the positions of the four objects. The virtual agent was located in the middle of the screen and occupied two subspaces (red shading) where no objects were located. The grid was not 3-D virtual objects visible during the experiment A graphics designer created 3-D objects for the virtual envi- front of the agent would attract disproportionately high ronment using the 3-D computer graphics software Autodesk attention. Maya (Autodesk Inc., 2016). The 128 objects were designed to represent a stereotypic instance of the objects that we had selected for the stimulus set. The texture that was added to the Pretest: Identifiability of 3-D virtual objects objects surface was either custom-made in the graphics soft- ware or taken from freely available pictures from the Internet. Twelve native speakers of Dutch (10 female, two male; 20–29 Objects were presented as much as possible at their expect- years of age, mean age = 24.1; they did not participate in the ed real-world size, but in certain cases they had to be scaled to main experiment) took part in a naming pretest that was con- account for the influence of object size on visual attention. ducted to ensure that the 3-D objects were identifiable in VR. Larger objects attract more visual attention than smaller ob- Thirty-five virtual scenes were presented in a virtual environ- jects, and are therefore more likely to be fixated by chance. We ment that was similar to that of the main experiment. Each thus aimed to keep the sizes of the target and distractor objects scene thus displayed a virtual agent sitting in a backyard comparable. We quantified object size by measuring the vol- surrounded by four objects. In total, 140 objects were present- ume of a virtual bounding box, which is the regular cuboid ed to the participants. The order of trials was randomized including the entire object. The perceived size of an object across participants, and the gender of the virtual agent was changes depending on its position in the virtual scene, but alternated across trials. We chose a stimulus display similar the size of the bounding box is a constant value of the item. to that in the main experiment to assess the identifiability of The volume of the target objects did not differ from the aver- the objects when they are presented in sets of four together age volume of the three distractor objects (two-tailed, paired with a virtual agent. Each object was randomly assigned to Student’s t test on logarithmic values, n =32, p = .34). one of eight positions on the virtual ground (see Fig. 3). The participants in the pretest were seated in a comfortable The positions of the objects were determined on the basis of a hypothetical grid on the virtual ground, represented in chair in the middle of the CAVE system, and a laptop for typing Fig. 3. The virtual space in the computer software is described their answers was placed on their lap. During the pretest they by means of a coordinate system wherein the x-axis represent- wore the VR glasses, softly fastened using a strap on their head ed the horizontal dimension, the y-axis the vertical dimension, to ensure stability, and calibrated by a single calibration step for and the z-axis the depth. The root of the coordinate system the head-tracking signal. They were instructed to type the four (0/0) was defined as the point of the middle line that was object names for each trial into the laptop. For recording their Bclosest^ to the observer. The exact x-and y-positions of the answers, we used a custom-made MATLAB script (The four objects were (–1/3.8) for Object 1, (1/3.8) for Object 2, MathWorks Inc 2013), which prompted the participants to enter (1.5/2.2) for Object 3, and (–1.5/2.2) for Object 4. The space their answer for each scene one after the other. Typing in the in front of and behind the virtual agent was never occupied by answers was self-paced, and the participants indicated that they an object. Due to the 3-D perspective, items placed behind the were ready for the next trial by raising their hand. During com- agent would be at least partly hidden, and objects located in pletion of the task, the participant was alone in the CAVE, but the Table 1 Comparison of the sentences in the two experimental conditions (restrictive and unrestrictive) for the critical on- and offset time points Verb Onset Verb Offset Determiner Onset Noun Onset Total Duration Restrictive 728 (128) 1,299 (162) 2,103 (256) 2,244 (268) 3,048 (319) Unrestrictive 730 (115) 1,315 (182) 2,132 (289) 2,263 (294) 3,082 (313) p Value .92 .62 .56 .71 .54 Mean values and standard deviations (in parentheses) are provided in milliseconds. All p values were obtained using two-tailed Student’s t tests. 1108 Behav Res (2018) 50:1102–1115 experimenter was able to give further instructions, if necessary, cameras were placed at the bottom edges. All cameras were via a microphone in the control room. oriented toward the middle of the CAVE system. The posi- Answers were manually coded offline with regard to the tions of the cameras are indicated in Fig. 4a. correct identifiability of the object and the preferred object Participants were sitting in a chair in the middle of the name. Objects that were incorrectly named by more than CAVE system so that the three screens covered their entire 25% of the participants (i.e., four out of the 12) were excluded horizontal visual field. The eyes of the participant were ap- from the stimulus set. If different synonymous names were proximately 180 cm away from the middle screen, so that 90° given as answers, we selected the object name that was used of the vertical visual field were covered by the display. The by the majority of the participants. A set of 128 suitable object four objects that were presented in each virtual scene extended names was then selected for the main experiment so that the across approximately 80° of the horizontal visual field. The criteria described above were met (e.g., no overlap in the first control room was located next to the experimental room con- phoneme for objects in the same scene). Eight objects with taining the CAVE system. The experimenters could visually insufficient identifiability were selected to serve for the prac- inspect the participant and the displays on the screens through tice trials in the main experiment. The final set of object names a large window behind the participant. is listed in Appendix 2, together with the corresponding verb The experiment was programmed and run using 3-D appli- pairs. cation software (Vizard, Floating Client 5.4, WorldViz LLC, Santa Barbara, CA), which makes use of the programming language Python. Spatial coordinates and distances in the Apparatus VR environment are expressed as dimensionless numbers. The software translates the numbers one-to-one into virtual The CAVE system The CAVE system consisted of three meters, but due to the adjusted object sizes, the numbers can screens (255 × 330 cm, VISCON GmbH, Neukirchen- be understood as relative rather than absolute measures. Vluyn, Germany) that were arranged at right angles as illus- Sound was presented through two speakers (Logitech, US) trated in the schematic drawing in Fig. 4. Two projectors (F50, that were located at the bottom edges of the middle screen at Barco N.V., Kortrijk, Belgium) illuminated each screen indi- the positions indicated in Fig. 4. The auditory signal was de- rectly through a mirror behind the screen. The two projectors tected by a custom-made NESU-Box (Nijmegen Experiment showed two vertically displaced images that overlapped in the Set Up, serial port), so that the on- and offset of the sentence middle of the screen (see Fig. 4b). Thus, the complete display were online recorded in the data stream. on each screen was only visible as the combined overlay of the two projections. Eyetracking Eyetracking was performed using special For optical tracking, infrared motion capture cameras (Bonita glasses (SMI Eye-Tracking Glasses 2 Wireless, 10, Vicon Motion Systems Ltd, UK) and the Tracker 3 soft- SensoMotoric Instruments GmbH, Teltow, Germany) that ware (Vicon Motion Systems Ltd, UK) were used. The infra- combine the recording of eye gaze with the 3-D presentation red cameras detected the positions of retroreflective markers of VR. The recording interface used is based on a Samsung by optical–passive motion capture. Six cameras were posi- Galaxy Note 4 that is connected to the glasses by cable. The tioned at the upper edges of the CAVE screens, and four recorder communicates with the externally controlled tracking Fig. 4 Schematic drawing of the CAVE system. (a) Top view indicating represent cameras at the bottom edges. The lower projectors are the configuration of the screens, the position of the participant, the depicted for illustration purposes only. (b) Side view of one pair of infrared motion capture cameras, and the speakers. Red points represent projectors that illuminate the screen indirectly, via mirrors cameras located at the upper edges of the screens, and purple points Behav Res (2018) 50:1102–1115 1109 system via a wireless local area network (wifi), which enables showed up as unstable and irregular movements of the gaze po- live data streaming. sition. At this point, it is unclear whether the relatively high num- The glasses were equipped with a camera for binocular 60-Hz berofparticipantswhohadtobeexcludedwasduetoadifference recordings and automatic parallax compensation. The shutter in eyetracker quality between the present and previous studies or device and the recording interface were placed on a table behind whether we simply happened to recruit a relatively high subset of the participants during recording. Gaze tracking accuracy was participants with pupils that would have been hard to detect by estimated by the manufacturer to be 0.5° over all distances. We any eyetracker. found the latency of the eyetracking signal to be 200 ± 20 ms. By combining eyetracking and optical head-tracking, we were able to identify the exact location of the eye gaze in three spatial Regions of interest To determine target fixations, we defined dimensions, allowing participants to move their heads during the individual 3-D regions of interest (ROIs) around each object in experiment. Optical head-tracking was accomplished by placing the virtual space. The x (width) and y (height) dimensions of the light reflectors on both sides of the glasses. Three spherical reflec- ROI were adopted from the frontal plane of the object’sindivid- tors were connected on a plastic rack and two of such racks with a ual bounding box, facing the participant. We adjusted the size of mirroredversion ofthe givengeometry weremanually attachedto thisplanetoensureaminimalsizeoftheROI.Theminimalwidth bothsides oftheglassesusing magnetic force.Thereflectorswork was set to 0.8 and the minimal height to 0.5. For the presented as passive markers that can be detected by the infrared tracking layoutsofobjects,theadjustedxandydimensionsweresufficient system in the CAVE. The tracking system was trained to the spe- to characterize the ROIs. Despite the 3-D view, the plane covered cific geometric structure of the three markers and detected the thewholeobjectsufficientlytocaptureallfixations.Thezdimen- position of the glasses with an accuracy of 0.5 mm. sion (depth) of the ROI was therefore set to a relatively small Calibration of the eyetracker required two separate steps: value of 0.1. An increased z value of the ROIs would not have One for the position of the head within the optical tracking beenmoreinformativeaboutthegazebehavior,butitwouldhave system and one for the position of the pupil monitored by the ledtooverlappingROIsinsomecases. The eyetracking software camera within the eyetracking glasses. For the calibration pro- automatically detected when the eye gaze was directed at one of cedure we used a virtual test scenery. This environment re- the ROIs and coded the information online in the data stream. sembled the inside of an Asian tea house and three colored Some previous studies have used the contours of the objects to spheres were displayed in front of the participants. The posi- define ROIs, but rectangles have been shown to produce qualita- tion of the tree spheres differed in all three spatial coordinates. tively similar results (Altmann, 2011). Duringthefirstcalibrationstep,participantswereaskedtolook atthethreedisplayedspheres successively.Theexperimenterwas Design and procedure present in the CAVE system and the calibration scene was also displayed on the recording interface. The experimenter selected The participants in the main experiment were seated in a com- thecorrespondingspherethatwasfixatedbytheparticipantonthe fortable chair in the middle of the CAVE system and were famil- recording interface. The second calibration step was performed iarized with the upcoming procedure. They put on the VR using Vizard software in the control room. The same test scenery glasses, which were softly fastened using a strap on their head and instructions were used and the experimenter could commu- to ensure stability. Prior to the start of the experiment, we per- nicate with the participants via the microphone. The computer formed the two calibration steps as described above. The calibra- software computed a single dimensionless error measure of the tionscreenwasfurthermoreusedto test whether the stereoscopic eyetracker combining the deviance in all three coordinates. The display produced by the shutter glasses was working correctly. computer-based calibration was repeated until a minimal error Participants were asked to remain seated during the experi- value (<5), and thus maximal accuracy, was reached. ment and not to move the glasses. No specific instructions were The accuracy of the eyetracker could not be assessed quanti- given,besidestocarefullylistentothesentencesandtolookatthe tatively. An error message occurred during the initial calibration display. Unlike typical eyetracking experiments, they were step if the eyetracker failed to detect the pupil with sufficient allowed to move their head. The experimental trials were preced- accuracy. In these cases, the participants were excluded from ed by two practice trials. Between trials, the empty virtual envi- the experiment. Retrospective assessment of the tracking quality ronment was presented without objects and virtual agent, but for eachparticipantwas performedusingcustom-madeplayback with a central fixation cross at the position of the agent’shead. software. The software illustrated the movement of the recorded Thecrossappearedfor1sandparticipantswereaskedtofixateon gaze position in a 3-D computer display together with the corre- the cross, whenever it appeared. The scene was presented for a sponding 3-D scenery. This display mode was used to visually preview time of 2 s before the audio file was played. The preview inspect calibration quality and accuracy. Calibration quality was time ensured that participants had enough time to encode visual assessed by inspecting the deviation of the gaze position during information and generate expectations (Huettig & McQueen, the presentation of the fixation cross. Low tracking accuracy 2007) despite the unfamiliar setting in a VR environment. 1110 Behav Res (2018) 50:1102–1115 Participants were presented with two experimental blocks Regression models have been shown to be a more appropriate of 32 trials each. The second block contained trials with the framework for analyzing eyetracking experiments. They capture reversed condition (restrictive vs. unrestrictive verb), such that the temporal dynamics of the gaze behavior by treating time as a each participant was exposed to all of the possible 64 exper- continuous variable. We transformed the dependent variables in imental trials. Between the two blocks, recalibration was per- the regression analysis using an empirical logit link function, formed using the Vizard software. The experimental condition which is the appropriate scale for assessing effects on a binary for each trial was determined on the basis of a pseudo- categorical dependent variable (Barr, 2008). The empirical logit randomization procedure. Four lists of trials were generated. is an approximation of the log odds transformation, which allows Each participant was assigned to one of the four lists, such that for a tolerance such that infinity is not returned when the argu- the design was counterbalanced with respect to the experimen- ment is 0 or 1. Specifically, we performed a weighted linear tal condition, gender of the virtual agent, and experimental regression over empirical logits (Barr, 2008). block. The order of trials within each block was randomized Because we designed a within-subjects and within-item exper- while keeping the gender of the virtual agent alternating. iment, a multilevel logistic regression was performed. In a mixed- After the experiment, participants underwent a short effect approach, nonindependence on the level of subjects and debriefing interview to assess whether they had recognized items was modeled by means of random effects. This approach the experimental manipulation. They were then informed makes it possible to control for their associated intraclass correla- about the actual aim of the study. Moreover, we informed tion, such as with random intercepts. The data for different trials them about the fact that we had recorded their eye movements, and subjects do not need to be pooled together, as is traditionally and all participants gave consent to use their eyetracking data done using ANOVAs. As a fixed effect in the main analysis, we for the purpose of the present study. modeled condition (restrictive vs. unrestrictive), time (as bin), and As we outlined above, the second block presented partici- their interaction. Statistics were calculated using mixed-effects pants with the same visual scenes as in the first block, but with models from the lme4 package (Baayen, Davidson, & Bates, the verb form from the opposite experimental condition. The 2008) in the R environment. In the Results section, we report the debriefing revealed that participants had noticed this, which parameter estimate (Est), standard error (SE), and p value for ef- led to an increase in fixations on the target object even before fects of interest. The variables condition and bin were contrast- verb onset. Therefore, we restricted the main data analysis to codedtofacilitatetheinterpretationofpossibleinteractionsandthe Block 1. The analysis of Block 2 nevertheless showed the directions of the effects. To avoid power reduction due to collin- same critical effect preceding noun onset (see the earity, we centered both variables by mean subtraction. supplementary materials). Data were acquired at a sampling frequency of 60 Hz, which means thatapproximately every 17 ms one sample was recorded. We corrected for the 200-ms latency shift caused by the Statistical analyses eyetracking system by time-locking the data to 200 ms (12 sam- Although many visual-world studies have analyzed ples) after sentence onset. As a variable of interest we defined the eyetracking data using t tests and analyses of variance proportion of fixations on the target object. A fixation was de- (ANOVAs), the data usually violate the underlying statistical finedasalookatthesameROIthatlastedatleast100ms—thatis, assumptionsforsuchtests(Jaeger,2008).Thefundamentalprob- six subsequent samples were coded as Bhits^ for the same ROI lemisthatANOVAsaredesignedtotesttheeffectofacategorical (see, e.g., Ettinger et al., 2003; Manor & Gordon, 2003; Sanchez, variable on a continuous variable. Most visual-world studies, Vazquez, Gomez, & Joormann, 2014; Sekerina, Campanelli, & however, assess the effect of a continuoustemporal phenomenon Van Dyke, 2016). This correction to the whole experimental (e.g., spoken language) on a categorical variable (e.g., a fixated dataset led to the exclusion of 2.8% of all samples in which a object).ForthesakeofANOVAs,timeisoftentransformedintoa Bhit^ to a predefined ROI was detected. The fixation data were categorical variable by collapsing the data into a series of time then aggregated into time bins of 50 ms (i.e., three samples) by windows (e.g., Hanna, Tanenhaus, & Trueswell, 2003)and ag- participant, trial, and condition. gregating over trials and subjects. Collapsing data into time win- Forthemainanalysis,weassessedtheeffectoftheexperimen- dows can, however, obscure effects such as anticipatory eye tal condition (restrictive vs. unrestrictive) on the proportions of movements and other dynamics. Furthermore, gaze behavior is target fixations over a critical time window of 1.5 s that spanned usually coded as a binary variable (0 = BROI not hit^ and 1 = from 200 ms (i.e., 12 samples) after verb onset until the average BROI hit^) and then transformed into a continuous variable by noun onset. The onset of the critical time window was based on calculating fixation proportions. The proportions and their con- previous evidence about saccadic planning (Matin et al., 1993); fidence intervals are only defined on a range from 0 to 1. 200 ms after verb onset is the earliest point at which the linguistic ANOVAs, however, assume unbound and homogeneous vari- stimulus can drive fixations to the target object. We further ances and might therefore produce spurious results (Jaeger, assessed the validity of this starting point by visually inspecting 2008). the grand means of the fully aggregated dataset. Behav Res (2018) 50:1102–1115 1111 Prior to the main analysis on the critical time window, we time. This pattern confirms that before noun onset the tar- eliminated fixations that had been initiated before the onset of gets—and not the distractors—attracted more fixations in the the verb and that extended into that time window, as had restrictive condition only. previous studies (e.g., Altmann & Kamide, 1999). This cor- rection led to the elimination of 9.6% of all saccades. Test for baseline effects Moreover, we analyzed baseline effects to check whether any confounding effect was present in the baseline period Finally,weanalyzedthebaselineperiod(i.e.,theintervalbetween before the experimental manipulation. time 0 and verb onset) preceding the critical time window, to test whether any confounding interactions were present before the onset of the experimental manipulation. Visual inspection of this Results baseline period in Fig. 5 suggests that there was no difference in the proportions of looks to the target versus the distractors in this Main analysis time window. The mean difference between target and distractor fixations across this time window was indeed <.001. We per- For themainstatistical analysis, wedefined acriticaltime window formed a linear regression on the differences in proportions to in which we expected the experimental manipulation to have an confirm that the difference did not change over time. The statis- effect on the proportions of target fixations. We chose the onset of tical model, which included time as a fixed factor and subject and –3 the critical window as 200 ms after verb onset, assuming that it trial as random factors (Est =6.67× 10 , SE =0.01, p = .82), takes approximately 200 ms to plan and initiate a saccadic move- confirmed the absence of differences during the baseline period. ment (Matin et al., 1993). As the offset of the criticaltime window, we chose the average onset of the noun, approximately 1,500 ms after verb onset. Discussion The main statistical analysis was hence performed on the crit- ical time window between the verb and noun onsets. We per- The purpose of the present study was twofold. First, we aimed to formed a regression analysis using a linear mixed model. As the conceptually replicate the findings of Altmann and Kamide dependent variable we entered the empirical logits of the propor- (1999) in an immersive 3-D VR environment. Second, in doing tions of target fixations. We modeled time (as a mean-centered so, we tested whether it is methodologically feasible to combine bin), condition (effect-coded), and their interaction as fixed ef- VRand eyetracking in the studyof online language processing in fects, and subject and trial as random effects. The fixation pro- a multimodal 3-D CAVE environment. Our successful concep- portions time-locked to sentence onset are illustrated in Fig. 5. tual replication of the original study indicates that the previous The model revealed that all fixed effects were significant findings do generalize to richer situations of stereoscopic 3-D (condition: Est =0.34, SE =0.02, p < .001; time: Est =0.02, vision. Methodologically, the present study confirms the feasi- –3 SE=1.01× 10 , p < .001; Condition × Time: Est=0.02, SE = bility of measuring eye movements in a rich 3-D experimental –3 2.02 × 10 , p < .001). This means that fixations to the target virtual environment, and it may therefore serve as a basis for increased over time and that the target was fixated more often future implementations that go beyond conceptual replication. during the restrictive condition. The significant interaction Altmann and Kamide (1999) presented visual stimuli that between condition and time reflected that the increase of target depicted seminaturalistic scenes. They argued that the predictive fixations was more pronounced in the restrictive condition. We performed the same analysis on the mean distractor fixations. The model revealed the same significant effects, but with a reversed sign for the influence of condition (condi- –3 tion: Est = –0.16, SE =9.17 ×10 , p < .001; time: Est =5.27 –3 –4 ×10 , SE =5.23 × 10 , p < .001; Condition × Time: Est = – –3 –3 8.41 × 10 , SE =1.06 ×10 , p < .001). Visual inspection of the gaze patterns (Fig. 5) suggested that fixations to the distractor objects in the critical time window can be qualita- tively described as mirroring the target fixations. These results are consistent with the hypothesis that partic- ipants directed their gazes to the target object more often and Fig. 5 Proportions of looks to the targets and distractors. The collapsed data are averaged across all participants (N = 21) and trials. Time 0 significantly earlier in the restrictive than in the unrestrictive represents sentence onset. The vertical lines indicate critical time points condition. The fixations to the distractor objects were influ- averaged across trials. The main statistical analysis was performed on the enced in the opposite way, meaning that the fixation propor- time window between verb onset (BVerb On^) and noun onset (BNoun On^). Error bars indicate standard errors of the means tions to distractors in the restrictive condition decreased over 1112 Behav Res (2018) 50:1102–1115 relationship they found between an auditorily presented verb and psycholinguistics and everyday situations of naturalistic lan- itssyntacticargumentswasmediatedbythereal-worldcontextin guage processing in rich multimodal contexts. which the scenes were embedded. The design of the In general, the present study adds to the previous evidence sug- seminaturalistic scenes, however, lacked experimental control gesting that VR is a promising tool for solving the trade-off be- of certain critical aspects that are known to influence an ob- tween experimental control and ecological validity in psycholin- server’s eye movements, such as the direction of the eye gaze guistic research. A critical assumption for generalizing findings ofthedepictedagentandtheanimacyofthetargetobject.Thefact obtained in a VR context to everyday situations is that people thatweobservedasimilareffectwhilecontrollingforthesevisual behave similarly in similar situations in the virtual world and the andadditionallexicalstimulus characteristics confirms the valid- real world. Initial studies into language processing indicated that ity of the original effect in these respects. Moreover, we showed this assumption was met. Similar linguistic-priming effects oc- that the effects generalize to more naturalistic viewing conditions curredwhenparticipantsinteractedwitheitherahuman-likevirtual inwhichparticipantsobserve3-Dobjects.Theeffectofincreased agent or a real person (Heyselaar et al., 2017), and participants target fixations in the restrictive verb condition seems more pro- accommodated their speech rate andpitchtothespeechrateand nounced in our results than in the original study. This might be pitch of their virtual interlocutors (Gijssels et al., 2016;Staum related to the lack of filler items in the present study. All spoken Casasanto et al., 2010). Other recent studies have also suggested sentences referred to the presented scene, and the target object thatparticipantsbehavesimilarlyinproducingandcomprehending named in the sentence was always visually present. language in VR versus traditional experimental paradigms, in We also observed a significant effect of experimental condition terms of both behavioral and neurophysiological measures on the proportions of fixations to the distractor objects. Fixation (Peeters&Dijkstra,2017;Trompetal.,inpress).Thepresentstudy proportions to the distractors in the restrictive condition decreased is in line with this overall tendency, in showing that participants more strongly than in the unrestrictive condition. Broadly speak- predictedupcomingwordsinvirtualcontextsqualitativelysimilar- ing, this pattern mirrored the pattern of looks to the target objects. ly to predictions of upcoming words in traditional, nonimmersive This is an intuitive result, given the design of the verb lists. The experimental paradigms. Future studies performed in CAVE envi- verbsintherestrictiveconditionarecharacterizedbytheirsemantic ronmentsmaycombineeyetrackingwithrecordingofelectrophys- features,whichmakethedistractorobjectsimplausiblearguments. iological data in order to further investigate the neurocognitive Altmann and Kamide (1999) found the same tendency but did not underpinnings of prediction in rich, multimodal contexts. characterize the effect statistically. This discrepancy demonstrates At a methodological level, we showed the feasibility of com- that regression models may be more suitable to detecting subtle bining eyetracking and VR in a CAVE environment. A technical effectsineyetrackingdata.Theclearereffectinourdatacanalsobe issue that may be improved in future implementations is the test attributed to the higher number of trials per condition. for tracking accuracy of the in-built eyetracking device. For the Furthermore, the fact that target fixations increased more than present setup it was not possible to assess tracking accuracy distractor fixations decreased can be explained by our definition quantitatively, but only via a warning message from the tracking of ROIs. We coded only for fixations that were directed to the software during the initial calibration step. When tracking accu- predefined ROIs, and not to other parts of the display. The propor- racy was found to be too low, those participants were excluded tions per time bin therefore do not add up to 1. Fixations to other from the study. Additional assessments of tracking accuracy and parts of the display, such as the virtual agent, were not captured in calibration performance were performed offline using playback the data, which could account for the remaining proportions. software to visually inspect the gaze patterns. Participant exclu- Previous extensions of Altmann and Kamide’s(1999) para- sion should ideally be performed by assessing tracking accuracy digm had aimed to test the ecological validity of the original quantitatively over the course of the whole experiment. findings by using photographs of visual scenes (e.g., In sum, the present study showed verb-mediated predictive Andersson et al., 2011; Staub et al., 2012). These studies concep- languageprocessinginarich,multimodal3-Denvironment,there- tually replicated the original findings, although one study sug- by confirming the ecological validity of previous findings in gested that the original effects may have been restricted to situa- nonimmersive 2-D environments. We conclude that eyetracking tions in which only a limited number of objects were presented measures can reliably be used to study theoretically interesting (Sorensen & Bailey, 2007). The present study focused on a dif- phenomena in automated virtual environments that allow for ferent element that is present in naturalistic language processing richer and ecologically more valid forms of stimulus presentation inavisualcontextbutnotinthetypicalVWPstudy—namely,the than do traditional, screen-based experimental paradigms. 3-Dcharacterofobjectsthatarereferredtoandthecorresponding stereoscopic view that includes natural depth cues. Moreover, Acknowledgments Open access funding provided by Max Planck unlike in typical screen-based studies using the VWP, our partic- Society. ipants were allowed to move their heads, and their visual field was not limited to the fovea. These elements are critical in bridg- Author note We thank Albert Russel, Jeroen Derks, and Reiner ing the gap between traditional experimental paradigms in Dirksmeyer for technical and graphical support. Behav Res (2018) 50:1102–1115 1113 Appendix 1. Linguistic parameters of the verbs Table 2 Lexical characteristics of the verb materials in the restrictive and unrestrictive conditions Measure Restrictive Unrestrictive p Value Length (# letters) 6.00 (1.48) 6.50 (0.74) .43 Length (# syllables) 2.00 (0.00) 2.00 (0.00) .25 Length (# phonemes) 5.00 (1.48) 5.00 (1.48) .47 Frequency (SUBTLEX-NL) (log)* 2.93 (0.63) 3.05 (0.68) .46 Frequency (SUBTLEX-NL2) (log)* 3.40 (0.63) 3.53 (0.68) .42 Similarity to other words 1.48 (0.52) 1.45 (0.52) .40 Age of acquisition 6.21 (1.39) 6.72 (1.74) .32 Word prevalence 2.75 (0.21) 2.80 (0.38) .95 Reaction time 526.50 (31.88) 520.00 (22.98) .81 Accuracy .98 (.03) .98 (.03) .52 Coltheart N 7.50 (8.15) 7.00 (7.41) .52 Normally distributed parameters are marked with an asterisk (*). Listed are the mean values with SDs, for normally distributed parameters, or median values with median absolute deviations, otherwise. Appendix 2. Stimulus materials Table 3 List of verb pairs and the four object names for each scene, all in Dutch Scene No. Restrictive Verb Unrestrictive Verb Target Object Distractor 1 Distractor 2 Distractor 3 1 bakken bezorgen frietjes kruk microfoon tennisbal 2 bespelen reinigen piano tandenborstel kachel spiegel 3 besturen wassen auto broek kom schep 4 breken vergeten wijnfles rekenmachine trui muts 5 dekken poetsen tafel wereldbol schoenen map 6 drinken bereiden koffie biefstuk tomaat mais 7 duwen bezien kinderwagen wafel poster open haard 8 eten dragen watermeloen gieter stoel halter 9 horen testen telefoon lippenstift pan basketbal 10 installeren winnen computer elpee bloem voetbal 11 kappen filmen boom sporttas paraplu koffer 12 knopen grijpen stropdas hamburger aansteker kam 13 koken bekijken pompoen ballon viool wasbak 14 kraken nuttigen walnoot donut tompouce muffin 15 lezen kopen krant druiven oven gitaar 16 likken kiezen lolly zaag pion veer 17 openen beschrijven deur mand thee hoed 18 persen vangen sinaasappel magneet pijp bril 19 plukken nemen tomaatje ijsje bord horloge 20 repareren stelen fietspomp sigaar dennenappel banaan 21 roken pakken sigaretten beker fiets tas 22 roosteren ontvangen boterham vlag jas schaar 23 ruiken gooien parfum mes skateboard cadeau 24 schillen wegen appel emmer doos sleutel 25 slikken checken pillen koptelefoon laptop bel 26 sluiten begeren boek kopje papier ei 27 smelten zoeken chocolade ontstopper mobiel gum 28 snijden proeven taart soep wijn cocktail 29 stoppen lenen radio zonnebril kaars pen 30 tellen verstoppen ballen hamer lamp paprika 31 versturen verplaatsen brief kleerhanger liniaal TV 32 volgen tekenen wegwijzer pizza kan bezem Depending on the experimental condition, either the restrictive or the unrestrictive verb was used in the spoken sentence. 1114 Behav Res (2018) 50:1102–1115 Table 4 Stimulus materials translated into English Scene No. Restrictive Verb Unrestrictive Verb Target Object Distractor 1 Distractor 2 Distractor 3 1 bake deliver fries stool microphone tennis ball 2 play clean piano toothbrush stove mirror 3 drive wash car pants bowl shovel 4 break forget wine bottle calculator jumper hat 5 set clean table globe shoes folder 6 drink prepare coffee steak tomato corn 7 push look at buggy waffle poster fireplace 8 eat carry watermelon watering can chair barbell 9 hear test telephone lipstick pan basketball 10 install win computer LP flower football 11 cut film tree sports bag umbrella suitcase 12 knot grab tie hamburger lighter comb 13 cook look at pumpkin balloon violin sink 14 crack eat walnut donuts cake muffin 15 read buy newspaper grapes oven guitar 16 lick choose lolly saw pawn feather 17 open describe door basket tea hat 18 squeeze catch orange magnet pipe glasses 19 pick take tomato ice cream plate watch 20 fix steal bike pump cigar pine cone banana 21 smoke take cigarettes cup bike bag 22 roast get slice of bread flag jacket scissors 23 smoke throw perfume knife skateboard gift 24 peel weigh apple bucket box key 25 swallow check pills head phone laptop bell 26 close want book cup paper egg 27 melt search chocolate plunger mobile phone eraser 28 cut taste cake soup wine cocktail 29 stop borrow radio sunglasses candle pencil 30 count hide balls hammer lamp paprika 31 send move letter hanger ruler television 32 follow draw sign pizza can broom Oxford handbook of eye movements (pp. 979–1004). New York, Open Access This article is distributed under the terms of the Creative NY: Oxford University Press. Commons Attribution 4.0 International License (http:// creativecommons.org/licenses/by/4.0/), which permits unrestricted use, Altmann, G. T. M., & Kamide, Y. (1999). Incremental interpretation at distribution, and reproduction in any medium, provided you give verbs: Restricting the domain of subsequent reference. Cognition, appropriate credit to the original author(s) and the source, provide a link 73, 247–264. doi:10.1016/S0010-0277(99)00059-1 to the Creative Commons license, and indicate if changes were made. Andersson, R., Ferreira, F., & Henderson, J. M. (2011). I see what you’re saying: The integration of complex speech and scenes during lan- guage comprehension. Acta Psychologica, 137, 208–216. Autodesk Inc. (2016). Maya, Version 2016 [Computer program]. San Rafael, CA: Alias Systems Corp., Autodesk Inc. References Baayen, R.H., Davidson, D.J., &Bates,D. M.(2008). Mixed-effectsmodel- ing with crossed random effects for subjects and items. Journal of Memory and Language, 59, 390–412. doi:10.1016/j.jml.2007.12.005 Allopenna, P. D., Magnuson, J. S., & Tanenhaus, M. K. (1998). Tracking the time course of spoken word recognition using eye movements: Barr, D. J. (2008). Analyzing Bvisual world^ eyetracking data using mul- Evidence for continuous mapping models. Journal of Memory and tilevel logistic regression. Journal of Memory and Language, 59, Language, 38, 419–439. doi:10.1006/jmla.1997.2558 457–474. Boersma, P., & Weenink, D. (2009). Praat: Doing phonetics by computer Altmann, G. T. M. (2011). The mediation of eye movements by spoken (Version 5.1.05) [Computer program]. Retrieved from www.praat.org language. In S. Liversedge, I. Gilchrist, & S. Everling (Eds.), The Behav Res (2018) 50:1102–1115 1115 Brysbaert, M., Stevens, M., Mandera, P., & Keuleers, E. (2016). The impact Landauer, T. K., & Dumais, S. T. (1997). A solution to Plato’s problem: The latent semantic analysis theory of acquisition, induction, and of word prevalence on lexical decision times: Evidence from the Dutch Lexicon Project 2. Journal of Experimental Psychology: Human representation of knowledge. Psychological Review, 104, 211–240. Perception and Performance, 42, 441–458. doi:10.1037/xhp0000159 doi:10.1037/0033-295X.104.2.211 Clark, A. (2013). Whatever next? Predictive brains, situated agents, and Mandera, P., Keuleers, E., & Brysbaert, M. (2017). Explaining human the future of cognitive science. Behavioral and Brain Sciences, 36, performance in psycholinguistic tasks with models of semantic sim- 181–204. doi:10.1017/S0140525X12000477 ilarity based on prediction and counting: A review and empirical Coco, M. I., Keller, F., & Malcolm, G. L. (2016). Anticipation in real‐ validation. Journal of Memory and Language, 92, 57–78. world scenes: The role of visual context and visual memory. Manor, B. R., & Gordon, E. (2003). Defining the temporal threshold for Cognitive Science, 40, 1995–2024. ocular fixation in free-viewing visuocognitive tasks. Journal of Cooper, R. M. (1974). The control of eye fixation by the meaning of Neuroscience Methods, 128, 85–93. spoken language: A new methodology for the real-time investiga- Matin, E., Shao, K. C., & Boff, K. R. (1993). Saccadic overhead: tion of speech perception, memory, and language processing. Information-processing time with and without saccades. Perception Cognitive Psychology, 6, 84–107. & Psychophysics, 53, 372–380. doi:10.3758/BF03206780 Cruz-Neira, C., Sandin, D. J., & DeFanti, T. A. (1993). Surround-screen The MathWorks Inc. (2013). MATLAB, Version 8.6.0 (R2015b) projection-based virtual reality: The design and implementation of [Computer program]. Natick, MA: Author. the CAVE. In M. C. Whitton (Ed.), SIGGRAPH ’93 Proceedings of Oostdijk, N., Reynaert, M., Hoste, V., & Schuurman, I. (2013). The con- the 20th Annual Conference on Computer Graphics and Interactive struction of a 500-million-word reference corpus of contemporary Techniques (pp. 135–142). New York: ACM. written Dutch. In Essential speech and language technology for Ettinger, U., Kumari, V., Crawford, T. J., Davis, R. E., Sharma, T., & Dutch (pp. 219–247). Berlin, Germany: Springer. Corr, P. J. (2003). Reliability of smooth pursuit, fixation, and sac- Peeters, D., & Dijkstra, T. (2017). Sustained inhibition of the native lan- cadic eye movements. Psychophysiology, 40, 620–628. guage in bilingual language production: A virtual reality approach. Friston, K. (2010). The free-energy principle: A unified brain theory? Bilingualism: Language and Cognition (in press). Nature Reviews Neuroscience, 11, 127–138. R Development Core Team. (2015). R: A Language and Environment for Gijssels, T., Staum Casasanto, L., Jasmin, K., Hagoort, P., & Casasanto, Statistical Computing. Vienna, Austria. Retrieved from https:// D. (2016). Speech accommodation without priming: The case of www.r-project.org/. pitch. Discourse Processes, 53, 233–251. Sanchez, A., Vazquez, C., Gomez, D., & Joormann, J. (2014). Gaze- Hanna, J. E., Tanenhaus, M. K., & Trueswell, J. C. (2003). The effects of fixation to happy faces predicts mood repair after a negative mood common ground and perspective on domains of referential interpre- induction. Emotion, 14, 85–94. tation. Journal of Memory and Language, 49, 43–61. Sekerina, I. A., Campanelli, L., & Van Dyke, J. A. (2016). Using the von Helmholtz, H. (1860). Handbuch der physiologischen optik. Leipzig: visual world paradigm to study retrieval interference in spoken lan- Leopold Voss. guage comprehension. Frontiers in Psychology, 7, 873. doi:10. Henderson, J. M., & Ferreira, F. (2004). Scene perception for psycholin- 3389/fpsyg.2016.00873 guists. In J. M. Henderson & F. Ferreira (Eds.), The interface of language, vision, and action: Eye movements and the visual world Slater, M. (2014). Grand challenges in virtual environments. Frontiers in (pp. 1–58). New York: Psychology Press. Robotics and AI, 1, 1–4. Heyselaar, E., Hagoort, P., & Segaert, K. (2017). In dialogue with an Snodgrass, J. G., & Vanderwart, M. (1980). A standardized set of 260 pic- avatar, language behavior is identical to dialogue with a human tures: Norms for name agreement, image agreement, familiarity, and partner. Behavior Research Methods, 49, 46–60. doi:10.3758/ visual complexity. Journal of Experimental Psychology: Human s13428-015-0688-7 Learning and Memory, 6, 174–215. doi:10.1037/0278-73220.127.116.11 Hintz, F. (2015). Predicting language in different contexts: The nature Sorensen, D. W., & Bailey, K. G. D. (2007). The world is too much: and limits of mechanisms in anticipatory language processing. Effects of array size on the link between language comprehension Nijmegen, The Netherlands: Unpublished doctoral dissertation. and eye movements. Visual Cognition, 15, 112–115. Huettig, F. (2015). Four central questions about prediction in language Staub, A., Abbott, M., & Bogartz, R. S. (2012). Linguistically guided processing. Brain Research, 1626, 118–135. anticipatory eye movements in scene viewing. Visual Cognition, Huettig, F., & McQueen, J. M. (2007). The tug of war between phono- 20, 922–946. logical, semantic and shape information in language-mediated visual Staum Casasanto, L., Jasmin, K., & Casasanto, D. (2010). Virtually ac- search. Journal of Memory and Language, 57, 460–482. commodating: Speech rate accommodation to a virtual interlocutor. Huettig, F., Quinlan, P. T., McDonald, S. A., & Altmann, G. T. M. (2006). In S. Ohlsson & R. Catrambone (Eds.), Cognition in flux: Models of high-dimensional semantic space predict language-mediated Proceedings of the 32nd Annual Meeting of the Cognitive Science eye movements in the visual world. Acta Psychologica, 121, 65–80. Society (pp. 127–132). Austin, TX: Cognitive Science Society. Huettig, F., Rommers, J., & Meyer, A. S. (2011). Using the visual world Tanenhaus, M. K., Spivey-Knowlton, M. J., Eberhard, K. M., & Sedivy, paradigm to study language processing: A review and critical eval- J. C. (1995). Integration of visual and linguistic information in spo- uation. Acta Psychologica, 137, 151–171. ken language comprehension. Science, 268, 1632–1634. Jaeger, T.F. (2008). Categoricaldataanalysis:AwayfromANOVAs(trans- Tromp, J., Peeters, D., Meyer, A. S., & Hagoort, P. (in press). The com- formationornot)andtowardslogitmixedmodels.JournalofMemory bined use of virtual reality and EEG to study language processing in and Language, 59, 434–446. doi:10.1016/j.jml.2007.11.007 naturalistic environments. Behavior Research Methods. doi:10. Kamide, Y., Altmann, G. T. M., & Haywood, S. L. (2003). The time- 3758/s13428-017-0911-9 course of prediction in incremental sentence processing: Evidence WorldViz. (2016). version 2016. Santa Barbara, CA: Vizard Virtual from anticipatory eye movements. Journal of Memory and Reality Software. Language, 49, 133–156. doi:10.1016/S0749-596X(03)00023-8 Yee, E., & Sedivy, J. C. (2006). Eye movements to pictures reveal transient Kuperberg, G. R., & Jaeger, T. F. (2016). What do we mean by prediction semantic activation during spoken word recognition. Journal of in language comprehension? Language, Cognition and ExperimentalPsychology:Learning,Memory,andCognition, 32,1–14. Neuroscience, 31, 32–59.
Behavior Research Methods – Springer Journals
Published: Aug 8, 2017