Dissociating the time courses of the cross-modal semantic priming effects elicited by naturalistic sounds and spoken words

Dissociating the time courses of the cross-modal semantic priming effects elicited by... Psychon Bull Rev (2018) 25:1138–1146 DOI 10.3758/s13423-017-1324-6 BRIEF REPORT Dissociating the time courses of the cross-modal semantic priming effects elicited by naturalistic sounds and spoken words 1 1 Yi-Chuan Chen & Charles Spence Published online: 9 June 2017 The Author(s) 2017. This article is an open access publication Abstract The present study compared the time courses of the enhances the sensitivity (d') of visual object detection cross-modal semantic priming effects elicited by naturalistic (Chen & Spence, 2011; Lupyan & Ward, 2013). Such re- sounds and spoken words on visual picture processing. sults suggest that the meaning of the auditory cue facilitates Following an auditory prime, a picture (or blank frame) was visual processing and boosts the breakthrough of the visual stimulus into awareness cross-modally rather than simply briefly presented and then immediately masked. The partici- pants had to judge whether or not a picture had been presented. giving rise to some sort of criterion change (note that the Naturalistic sounds consistently elicited a cross-modal seman- dog barking certainly induces a likely guess that the crea- tic priming effect on visual sensitivity (d') for pictures (higher ture might be a dog as well). d' in the congruent than in the incongruent condition) at the The time courses of cross-modal semantic priming effects, 350-ms rather than at the 1,000-ms stimulus onset asynchrony however, appear to be different for naturalistic sounds and (SOA). Spoken words mainly elicited a cross-modal semantic spoken words. Chen and Spence (2011) demonstrated that priming effect at the 1,000-ms rather than at the 350-ms SOA, when leading the target picture by 346 ms, only naturalistic but this effect was modulated by the order of testing these two sounds (rather than spoken words) elicited a semantic priming SOAs. It would therefore appear that visual picture processing effect on visual picture sensitivity in a simple detection task can be rapidly primed by naturalistic sounds via cross-modal (when judging whether a picture was present or not). These associations, and this effect is short lived. In contrast, spoken results were explained based on evidence suggesting that words prime visual picture processing over a wider range of naturalistic sounds access their associated meaning faster than prime-target intervals, though this effect was conditioned by spoken words do (Chen & Spence, 2013; Cummings et al., the prior context. 2006; Saygin, Dick, & Bates, 2005). The different processing times plausibly stem from the differing routes of semantic . . . access for each type of auditory stimulus: Naturalistic sounds Keywords Semantic Multisensory Audiovisual . access semantic information directly, whereas spoken words Sensitivity Priming have to access their meanings via lexical representations (Barsalou, Santos, Simmons, & Wilson, 2008;Chen& In daily life, hearing the sound of a dog barking is likely Spence, 2011; Glaser & Glaser, 1989). informative with regard to the identity of a creature that is Lupyan and colleagues, on the other hand, demonstrated an glimpsed, albeit briefly (Chen & Spence, 2010). Indeed, the advantage for spoken words over naturalistic sounds at longer presentation of either a naturalistic sound or spoken word SOAs (around 1,000 ms or more; Edmiston & Lupyan, 2015; Lupyan & Thompson-Schill, 2012 ). The participants in their studies had to verify whether the auditory cue (either a * Yi-Chuan Chen yi-chuan.chen@psy.ox.ac.uk The interstimulus interval was 400, 1000, or 1500 ms in Lupyan and Thompson-Schill (2012). However, since they did not report the duration of Crossmodal Research Laboratory, Department of Experimental the auditory cue, it is not possible to determine the corresponding SOAs. Psychology, University of Oxford, 9 South Parks Road, Oxford OX1 Nevertheless, the SOAs were certainly longer than the 346 ms in Chen and 3UD, UK Spence (2011). Psychon Bull Rev (2018) 25:1138–1146 1139 naturalistic sound or a spoken word) and the subsequently of the study. The study was approved by the Medical presented picture matched or not. The results demonstrate that Sciences Inter Divisional Research Ethics Committee, the participants’ reaction times (RTs) were shorter for spoken University of Oxford (MSD-IDREC-C1-2014-143). words than for naturalistic sounds. Further evidence comes from an event-related potential (ERP) study: When a spoken Apparatus and stimuli word led a target picture by around 1,670 ms, the P1 compo- nent associated with the picture (at 70–125 ms after onset) The visual stimuli were presented on a 23-inch LED monitor occurred earlier in the congruent than in the incongruent con- controlled by a personal computer. The participants sat at a dition, but no such congruency effect was induced by natural- viewing distance of 58 cm from the monitor in a dimly lit istic sounds (Boutonnet & Lupyan, 2015). These results were chamber. Twenty-four outline-drawings (12 living and 12 explained in terms of spoken words being associated with nonliving things) taken from Snodgrass and Vanderwart semantic representations that are more abstract and categori- (1980) and Bates et al. (2003), as well as their mirror images, cal, thus providing a conceptual cue regarding a given object were used as visual targets (see Appendix). Five pattern masks that is general rather than specific to a particular exemplar, as were created by overlapping 20 nonobject figures randomly compared to naturalistic sounds (Edmiston & Lupyan, 2015; selected from Magnié, Besson, Poncet, and Dolisi (2003). Lupyan & Thompson-Schill, 2012). Each pattern covered an area of 5.9° × 5.9°, sufficient to Given the different SOAs and methods, and given the dif- completely occlude all of the target pictures. ferent mechanisms proposed by previous research (Chen & The auditory stimuli (8 bit mono; 22500 Hz digitization) Spence, 2011; Edmiston & Lupyan, 2015; Lupyan & were presented over closed-ear headphones and ranged in Thompson-Schill, 2012), we wanted to carefully examine loudness from 31 to 51 dB sound pressure level (SPL). The the time courses of cross-modal semantic priming effects elic- naturalistic sounds were those produced by each of the ob- ited by naturalistic sounds and spoken words. Two critical jects. The spoken words consisted of the most commonly SOAs were chosen: The 350- ms SOA is close to the interval agreed-upon name used to refer each picture (Bates et al. at which Chen and Spence (2011) demonstrated cross-modal 2003; Snodgrass & Vanderwart, 1980) and were produced semantic priming by naturalistic sounds (but not by spoken by a female native English speaker. The naturalistic sound words) in a picture detection task. The 1,000-ms SOA (the and the spoken word associated with the same picture were interstimulus interval, ISI, was 500–650 ms) corresponds to edited to have the same duration. The root mean square values the ISI somewhere between 400 and 1,000 ms used by of all of the auditory stimuli were equalized. Lupyan and Thompson-Schill (2012), the conditions demon- strated a cross-modal semantic advantage for spoken words Design over naturalistic sounds. In Experiment 1, each participant was tested with only one of the SOAs, following the designs Two within-participants factors, prime type (naturalistic sound of Chen and Spence (2011) and Lupyan and Thompson-Schill or spoken word) and congruency (congruent or incongruent), (2012). In Experiment 2, participants were tested with both and one between-participants factor, SOA (350 or 1,000 ms), SOAs in a counterbalanced order. In this case, we further were manipulated. Naturalistic sounds and spoken words were examined whether the time courses of cross-modal semantic presented in separate blocks of trials. Congruent and incon- priming effects are stable or modulated by prior context. gruent trials were mixed within blocks: The auditory cue matched the picture in the congruent trials, but they belonged to different categories based on the fundamental living thing Experiment 1 versus nonliving thing separation in the incongruent trials. Each SOA was tested with 20 participants. Method All 24 pictures and their mirror images were presented once in each block—either one was presented in the congruent trial Participants and the other in the incongruent trial (and they were swapped in another block). These trials were used to estimate the par- Forty volunteers (10 males, mean age 22.2 years) took part ticipant’s hit rate in the congruent and incongruent conditions, in this experiment in exchange for course credit or five respectively. An additional 48 picture-absent trials, consisting pounds (UK sterling). The participants were native of an auditory cue and a blank frame, were presented to esti- English speakers or bilinguals who had started to learn mate the participant’s false alarm (FA) rate. These 96 trials English by 5 years of age. All participants had normal or were presented in a completely randomized order. There were corrected-to-normal vision and normal hearing by self-re- two blocks for both naturalistic sounds and spoken words, and port, and all were naïve as to the purpose of the study. the order of these two types of auditory stimuli was counterbalanced across participants. The participants were Written informed consent was obtained prior to the start 1140 Psychon Bull Rev (2018) 25:1138–1146 not given any information concerning the possible semantic Prior to the start of the main experiment, all of the pictures congruency between the auditory cue and picture prior to and their matched names were presented on the monitor in a taking part in the study. completely randomized order across participants. Each picture-name pair was presented for 1,500 ms and interleaved by a blank frame for 500 ms. An easy practice session (eight Procedure trials with a picture duration of 33 ms) and a harder practice session (16 trials with a picture duration of 17 ms) were con- The participants initiated a block of trials by pressing the enter ducted prior to the main experiment. In the easy practice ses- key on the keyboard in front of them. In each trial (see sion, the accuracy had to reach 85%, or it was repeated up to Fig. 1a), a blank frame was followed by either a frame with three times. The stimuli in the practice session were not used a picture or another blank for 17 ms (one frame at the screen in the main experiment. The experiment lasted for approxi- refresh rate of 60 Hz). The pattern mask was presented imme- mately 30 minutes. diately thereafter; meanwhile, the participants had to decide whether they had seen a picture (irrespective of its identity) presented before the mask by pressing the space bar. Results The participants were informed that the task was not speeded, and they should only respond if they were sure For both naturalistic sounds and spoken words, the hit rate in that they had seen a picture (i.e., they should maintain a the congruent and incongruent conditions was estimated on strict response criterion). the basis of 48 trials (24 pictures × 2 blocks), while the FA rate (A) Procedure Blank 1,483 ms Target 17 ms Mask 1,500 ms (B) Design based on signal detecon theory Internal response incongruent congruent Fig. 1 a Sequence of three frames presented in each trial: A blank, a conditions, respectively. In this design, the congruent and incongruent target picture (e.g., a dog), and a pattern mask. The target picture and conditions share the same FA rate. The sensitivity (d') was calculated pattern mask were presented in black in the center of a white background. using the equations: d' = z(hit rate) – z(FA rate) in the congruent and b Represents the current experimental design in terms of signal detection incongruent conditions, separately (Green & Swets, 1966;Macmillan & theory. The distributions of dashed, dotted,and solid lines represent target Creelman, 2005) present/congruent, target present/incongruent, and target absent Probability Psychon Bull Rev (2018) 25:1138–1146 1141 was estimated on the basis of 96 trials (48 picture- submitted to a four-way ANOVA (see Table 4afor the absent trials × 2 blocks; see Table 1); d' values were calculated results). There was a significant three-way interaction be- based on the hit and FA rate (see Figs. 1b–2), and then sub- tween congruency, prime type, and order. Two separate mitted to a three-way analysis of variance (ANOVA) with the two-way ANOVAs for each prime type with the factors factors of congruency, prime type, and SOA (see Table 2 of congruency and order demonstrated that the congruency for results). Critically, there was a significant three-way effect was significant for naturalistic sounds, F(1, 34) = interaction. Paired t tests (Holm-Bonferroni correction, 7.50, p <.05, η = 0.18, without being modulated by order one-tailed were used because higher d' in the congruent (Congruency × Order: F <1, p =.87, η = 0.001). than the incongruent condition was expected) demonstrat- However, for spoken words, the congruency effect was ed the congruency effect by naturalistic sounds at the 350- modulated by order (Congruency × Order): F(1, 34) = ms SOA, t(19) = 2.81, p < .05, but not at the 1,000-ms 12.41, p < .005, η = 0.27. Post hoc tests demonstrated SOA, t(19) = -1.63, p = .12; in contrast, the congru- that the congruency effect by spoken words was significant ency effect by spoken words occurred at the 1,000-ms in Group 2 (1,000–350 ms), t(17) = 6.18, p < .001, but not SOA, t(19) = 2.87, p < .05, but not at the 350-ms in Group 1 (350–1,000 ms), t(17) = 0.65, p =.53.These SOA, t(19) = 0.32, p = .75. We therefore replicated results therefore suggest that the SOA order influenced the the results at the 350-ms SOA reported in Chen and cross-modal semantic congruency effect elicited by spoken Spence (2011). words but not by naturalistic sounds. Such a carryover effect of the SOA from one session to the next may mask the modulation of SOA on the congruency effect from au- ditory cues. The data from the two sessions were therefore Experiment 2 analyzed separately. When only including the data from the first session (top Method row in Fig. 3), a three-way ANOVA with the factors of congruency, prime type, and SOA (with SOA as a Thirty-six volunteers (seven males, mean age 19.7 years) between-participant factor) was conducted (see Table 4b). took part in this experiment. Three factors were tested with This is the same design as in Experiment 1, and the results all participants: prime type (naturalistic sound or spoken were replicated: The three-way interaction was significant. word), congruency (congruent or incongruent), and SOA Paired t tests demonstrated that the congruency effect by (350 or 1,000 ms). The fourth factor, the order in which the naturalistic sounds was only observed at the 350-ms SOA, SOAs were tested, was manipulated between participants: t(17) = 2.50, p < .05, but not at the 1,000-ms SOA, t(17) = - Half of the participants were tested with the 350-ms SOA 0.27, p = .79. In contrast, the congruency effect by spoken in the first session and the 1,000-ms SOA in the second words was only statistically significant at the 1,000-ms session (Group 1: 350–1,000 ms); the order was reversed SOA, t(17) = 3.94, p < .005, but not at the 350-ms SOA, for the remainder of the participants (Group 2: 1,000–350 t(17) = 0.50, p =.62. ms). The stimuli and task were the same as in Experiment The results of the second session (bottom row in 1. The experiment took an hour to complete. Fig. 3) were different from the first session (see Table 4c). The significant interaction between congruen- Results cyandSOA wasattributedtothe fact that thecongru- ency effect was only significant at the 350-ms SOA, The participant’s d' (see Fig. 3) was calculated based on the t(17) = 4.69, p < .001, but not at the 1,000-ms SOA, hit and FA rates in each condition (see Table 3), and then t(17) = 0.16, p = .87. Planned comparisons demon- strated that the congruency effect was significant at the 350-ms SOA for both naturalistic sounds, t(17) = 2.98, Table 1 Percentage of hit and false alarm (FA) rates (SE in parentheses) p < .05, and spoken words, t(17) = 4.30, p < .001, but in each of the conditions in Experiment 1 for neither at the 1,000-ms SOA, t(17) = -0.57, p =.58, SOA (ms) Sound type Hit rate FA rate and t(17) = 0.65, p = .53, respectively. The significant interaction between congruency and prime type reflected Congruent Incongruent the congruency effect being significant for spoken words, t(35) = 3.48, p < .005, but only marginally sig- 350 Naturalistic sound 83.4 (3.8) 79.1 (4.0) 7.1 (2.6) nificant for naturalistic sounds, t(35) = 1.88, p =.07. Spoken word 84.0 (3.6) 81.9 (4.6) 6.7 (2.1) The latter perhaps results from the slightly higher d' in 1,000 Naturalistic sound 65.3 (3.7) 68.3 (4.2) 10.3 (4.0) the incongruent than in the congruent condition at the Spoken word 73.8 (3.5) 67.3 (3.9) 12.4 (3.6) 1,000-ms SOA. 1142 Psychon Bull Rev (2018) 25:1138–1146 Congruent Incongruent Sound Word Sound Word 350 ms 1000 ms Fig. 2 Mean sensitivity (d') at the 350- and 1,000-ms SOAs in Experiment 1. Error bars indicate ±1 SEM. Sound = naturalistic sounds; Word = spoken words General discussion meaning for the former is shorter (Chen & Spence, 2013; Cummings et al., 2006; Saygin et al., 2005). Consistent evi- The results of the present study demonstrate that the pre- dence comes from the results of ERPs studies: For instance, sentation of naturalistic sounds enhanced the visual sensi- Murray, Camen, Andino, Bovet, and Clarke (2006) have dem- tivity of semantically congruent pictures at the shorter SOA onstrated that the brain activities associated with naturalistic (350 ms) than spoken words did (1,000 ms) when each sounds produced by living versus nonliving things can be participant just encountered either one of the SOAs. The discriminated around 70 ms to 119 ms after sound onset. cross-modal semantic priming effects elicited by the pre- The component associated with the meaning of spoken words sentation of naturalistic sounds versus spoken words can (the N400; Kutas & Hillyard, 1980), on the other hand, typi- therefore be dissociated in terms of their differing time cally starts 200 ms after word onset, and it could be delayed if courses. Furthermore, naturalistic sounds consistently the word is longer or else shares initial syllables with other primed the visual pictures at the short SOA; in contrast, words (van Petten, Coulson, Rubin, Plante, & Parks, 1999). the priming effect elicited by spoken words was signifi- These results may partly be attributed to the nature of the cantly modulated by the SOA tested beforehand. acoustic signals that are associated with each type of stimulus: Specifically, when the 1,000-ms SOA (demonstrating a sig- Naturalistic sounds associated with different object categories nificant priming effect) was tested first, the priming effect have distinct time-frequency spectrums from each other (e.g., carried over to the 350-ms SOA (this was not observed in Murray et al., 2006). By contrast, spoken words become com- Experiment 1). However, when the 350-ms SOA (where no prehensible when the acoustic signals are abstracted into var- priming effect was observed) was tested first, the priming ious phonetic representations, and the latter are then used to effect at the 1,000-ms SOA was eliminated as well. Finally, access their associated lexical representations (Obleser & higher sensitivity in the 350-ms than in the 1,000-ms SOA Eisner, 2009). Consequently, a semantic network suggests that was observed in both experiments. This can be explained naturalistic sounds and visual pictures access semantics direct- by an attentional cuing effect elicited by the presentation of ly, whereas spoken words access their meanings via lexical a temporally close auditory cue (McDonald, Teder- representations (Chen & Spence, 2011, 2017;Glaser& Sälejärvi, & Hillyard, 2000). Glaser, 1989). Hence, the cross-modal semantic interactions That naturalistic sounds elicited the cross-modal seman- between naturalistic sounds and pictures would be expected to tic priming effect faster (i.e., at the shorter SOA) than spo- occur more rapidly than between spoken words and pictures, ken words did suggests that the time required to access as demonstrated in the present study. Table 2 Results of analysis of sensitivity (d') in Experiment 1 (three-way ANOVA: Congruency × Prime Type × SOA) Effect F(1, 38) p η Note Congruency 5.92 <.05 0.14 Congruent (2.58) > Incongruent (2.50) SOA 11.22 <.005 0.23 350 ms (2.99) > 1,000 ms (2.10) Congruency × Prime Type × SOA 9.99 <.005 0.21 Psychon Bull Rev (2018) 25:1138–1146 1143 Group 1 Group 2 st session Sound Word Sound Word 350 ms 1000 ms Congruent nd Incongruent session Sound Word Sound Word 350 ms 1000 ms Fig. 3 Mean sensitivity (d') at the 350- and 1,000-ms SOAs for Group 1 (tested in the order 350- and then 1,000-ms SOA) and Group 2 (with the order reversed) in Experiment 2. Error bars indicate ±1 SEM. Sound = naturalistic sounds; Word = spoken words At the 1,000-ms SOA, only spoken words but not natu- the phonological loop (Snyder & Gregg, 2011;Soemer& ralistic sounds gave rise to cross-modal semantic priming Saito, 2015), and both processes take extra time or cogni- effects, thus suggesting that the effect induced by natural- tive resources. In contrast, spoken words essentially have istic sounds is short-lived (see also Chen & Spence, 2017; the benefit of being maintained in the phonological loop in Kim, Porter, & Goolkasian, 2014, when using the picture the working memory system (Baddeley, 2012), thus lead- categorization task). Given that naturalistic sounds can ac- ing to the significant priming effect over a greater range of cess their meaning rapidly (within 350 ms in the current SOAs than naturalistic sounds (current study; Chen & study), the short-lived priming effect suggests that the ac- Spence, 2017). tivated meaning would be forgotten rapidly as well, unless The final contrast lies in the fact that the time course the information can be temporally maintained. The main- of the cross-modal semantic priming effect by naturalis- tenance of representations of naturalistic sounds, neverthe- tic sounds was stable, whereas that elicited by spoken less, is underpinned by the auditory imagery capability, or words was modulated by the prior context (i.e., the else by being transferred into lexical codes and stored in order in which the SOAs were tested). Audiovisual Table 3 Percentage of hit and false alarm (FA) rates (SE in parentheses) in each of the conditions in Experiment 2 SOA (ms) Sound type Hit rate FA rate Congruent Incongruent Group 1 350 (first session) Naturalistic sound 88.0 (2.3) 81.0 (4.3) 3.2 (1.4) Spoken word 82.6 (3.8) 81.4 (4.0) 2.2 (0.6) 1,000 (second session) Naturalistic sound 71.9 (3.7) 72.3 (4.1) 2.5 (0.9) Spoken word 71.4 (4.4) 69.0 (5.2) 1.4 (0.4) Group 2 350 (second session) Naturalistic sound 80.7 (4.0) 75.8 (5.1) 5.6 (3.3) Spoken word 79.2 (3.8) 67.4 (5.4) 3.5 (2.2) 1,000 (first session) Naturalistic sound 69.4 (4.0) 69.7 (4.5) 4.7 (2.2) Spoken word 71.1 (4.9) 64.0 (5.9) 5.5 (2.3) d' d' 1144 Psychon Bull Rev (2018) 25:1138–1146 Table 4 Results of analysis of sensitivity (d') in Experiment 2 Effect F(1, 34) p η Note (A) Four-way ANOVA (Congruency × Prime Type × SOA × Order) Congruency 17.85 <.001 0.34 Congruent (2.99) > Incongruent (2.85) SOA 45.32 <.001 0.57 350 ms (3.13) > 1,000 ms (2.70) Congruency × SOA 10.15 <.005 0.23 Congruency × Prime Type 6.46 <.05 0.16 Congruency × Order 4.49 <.05 0.12 Congruency × Prime Type × Order 16.95 <.001 0.33 (B) First session: three-way ANOVA (Congruency × Prime Type × SOA) Congruency 8.85 <.01 0.21 Congruent (2.98) > Incongruent (2.85) SOA 6.65 <.05 0.16 350 ms (3.26) > 1,000 ms (2.58) Congruency × Prime Type × SOA 12.24 <.005 0.27 (C) Second session: three-way ANOVA (Congruency × Prime Type × SOA) Congruency 15.83 <.001 0.32 Congruent (2.99) > Incongruent (2.84) Congruency × SOA 14.41 <.005 0.30 Congruency × Prime Type 4.96 <.05 0.13 integration/interactions involving speech sounds have interval should result from the former being better main- been demonstrated to be flexible. For example, the tained in working memory. Finally, consistent with previ- cross-modal semantic priming effect by spoken words ous studies, interactions between spoken words and visual can be speeded up so as to be observed at around signals are flexible—that is, they can be enhanced or 350-ms SOA if the participants have been exposed to inhibited by prior context or by task demands. the longer SOA condition (the current study) or if the participants have to identify the target picture by Acknowledgements The authors are supported by the Arts and reporting its name (Chen & Spence, 2011). In addition, Humanities Research Council (AHRC), Rethinking the Senses grant (AH/L007053/1). the integration of verbal cues and visual lip movements occurs more often (indexed by a larger McGurk effect; McGurk & McDonald, 1976) if a series of congruent (compared to incongruent) audiovisual speech stimuli Appendix were presented beforehand (Nahorna, Berthommier, & Schwartz, 2012). Finally, the digits and letters that are The auditory stimuli used in the present study. Note presented subliminally to both vision and audition that the lengths of the naturalistic sound and spoken would be integrated only if the participants had con- word referring to the same picture were matched at sciously experienced these pairings prior to the test 350 ms for the 14 one-syllable words, 450 ms for the (Faivre, Mudrik, Schwartz, & Koch, 2014). The higher seven two-syllable words, and 500 ms for the three- flexibility of audiovisual interactions involving spoken (or more) syllable words (three words). The identi- words than naturalistic sounds at the semantic level perhaps fication accuracy, confidence, and familiarity ratings stems from the former accessing the semantic representations (maximum score = 7) for the sounds reflect the mean at an abstract, categorical, and modality-insensitive level, performance of 18 participants (four males, mean age whereas the latter served as modality-specific and context- 28 years, reported in Chen & Spence, 2011). The rat- dependent attributes associated with the object cross- ings of imagery concordance (maximum score = 5) modally (Edmiston & Lupyan, 2015; Waxman & Gelman, were acquired via an online study (30 participants for 2009). naturalistic sounds, 18 males, mean age 31 years; 30 Together, the results of the two experiments reported participants for spoken words, 19 males, mean age 33 here demonstrate that naturalistic sounds elicit more rapid years). All scores were lower for naturalistic sounds cross-modal priming than do spoken words, which is likely than for spoken words. Identification accuracy: t(23) = 5.42, determined by their speed of semantic access stemming p < .001; confidence rating: t(25) = 8.33, p < .001; familiarity from the different processing routes. On the other hand, rating: t(26) = 8.38, p < .001; imagery concordance: the advantage of spoken words over naturalistic sounds to t(24) = 4.24, p < .001; two-tailed, unequal variance prime visual pictures across a more prolonged prime-target assumed Psychon Bull Rev (2018) 25:1138–1146 1145 Naturalistic Spoken sounds words Picture(Spoken Naturalistic Duration Identification Mean Mean Mean imagery Identification Mean confidence Mean Mean imagery word) sound (ms) accuracy (%) confidence familiarity concordance accuracy (%) rating familiarity concordance rating rating rating Bird Bird chirping 350 77.8 4.7 4.8 4.4 100 6.9 6.9 4.7 Cat “Meow” 350 100 6.2 6.3 4.4 100 7.0 7.0 4.8 Cow “Moo” 350 88.9 6.4 6.3 4.8 100 6.9 7.0 4.8 Dog “Woof” 350 94.4 6.4 6.4 4.3 100 6.9 6.9 4.8 Duck “Quack” 350 100 6.8 6.3 4.5 100 6.9 6.9 4.8 Eagle Eagle call 450 55.6 4.9 4.6 4.5 100 6.8 6.9 4.8 Elephant Roar 500 61.1 4.8 4.6 4.7 100 6.9 6.9 4.9 Frog “Ribbit” 350 83.3 5.1 5.1 4.2 100 7.0 7.0 4.9 Goat “Baa-baa” 350 72.2 5.6 5.4 4.4 100 6.9 6.7 4.9 Horse Horse neighing 350 61.1 5.2 4.9 4.0 94 6.2 6.7 4.9 Pig “Oink” 350 50.0 5.2 5.4 4.2 100 6.9 7.0 4.9 Rooster Crowing sound 450 88.9 5.8 5.6 4.7 100 6.9 6.2 4.8 Car Car starting 350 77.8 5.4 5.8 4.0 100 7.0 7.0 4.6 Door Creak 350 88.9 4.8 5.2 4.3 100 6.9 6.8 4.7 Drum Banging of a 350 94.4 5.8 5.8 4.3 100 7.0 6.8 4.8 drum Guitar Guitar sound 450 88.9 6.0 6.4 4.8 100 7.0 6.9 4.9 Gun Gunshot 350 61.1 4.3 4.1 3.6 100 6.9 6.9 4.8 Motorcycle “Vroom-vroom” 500 33.3 4.8 5.1 3.4 100 6.9 6.7 4.7 Piano Piano sound 450 94.4 6.1 6.2 4.8 100 7.0 6.9 4.7 Scissors Clipping 450 22.2 3.2 3.8 3.7 100 6.9 6.9 4.9 scissors Switch Click 350 27.8 3.8 4.2 2.0 100 6.6 6.5 4.6 Telephone Telephone 500 100 6.6 6.7 4.7 100 7.0 7.0 4.6 ringing Trumpet Trumpet sound 450 83.3 5.3 5.3 4.6 100 7.0 6.7 4.9 Whistle Whistling 450 66.7 5.4 5.5 4.8 94.4 6.8 6.6 4.9 Mean (SEM) 60.9 (7.9) 5.2 (0.2) 5.2 (0.2) 4.3 (0.1) 98.7 (0.9) 6.9 (0.1) 6.8 (0.1) 4.8 (0.02) 1146 Psychon Bull Rev (2018) 25:1138–1146 Green, D. M., & Swets, J. A. (1966). Signal detection theory and The stimuli used in the practice sessions were fly (hum- psychophysics. New York: Wiley. ming fly), tiger (tiger roaring), bell (bell ringing), and cannon Kim, Y., Porter, A. M., & Goolkasian, P. (2014). Conceptual priming with (cannon fire) pictures and environmental sounds. Acta Psychologica, 146, 73–83. Kutas, M., & Hillyard, S. A. (1980). Reading senseless sentences: Brain Open Access This article is distributed under the terms of the Creative potentials reflect semantic incongruity. Science, 207, 203–205. Commons Attribution 4.0 International License (http:// Lupyan, G., & Thompson-Schill, S. L. (2012). The evocative power of creativecommons.org/licenses/by/4.0/), which permits unrestricted use, words: Activation of concepts by verbal and nonverbal means. distribution, and reproduction in any medium, provided you give appro- Journal of Experimental Psychology: General, 141, 170–186. priate credit to the original author(s) and the source, provide a link to the Lupyan, G., & Ward, E. J. (2013). Language can boost otherwise unseen Creative Commons license, and indicate if changes were made. objects into visual awareness. Proceedings of the National Academy of Sciences of the United States of America, 110, 14196–14201. Macmillan, N. A., & Creelman, C. D. (2005). Detection theory: A user’s guide. Mahwah, NJ: Erlbaum. Magnié, M. N., Besson, M., Poncet, M., & Dolisi, C. (2003). The References Snodgrass and Vanderwart set revisited: Norms for object manipu- lability and for pictorial ambiguity of objects, chimeric objects, and Baddeley, A. (2012). Working memory: Theories, models, and contro- nonobjects. Journal of Clinical and Experimental Neuropsychology, versies. Annual Review of Psychology, 63, 1–29. 25, 521–560. Barsalou, L. W., Santos, A., Simmons, W. K., & Wilson, C. D. (2008). McDonald, J. J., Teder-Sälejärvi, W. A., & Hillyard, S. A. (2000). Language and simulation in conceptual processing. In M. de Vega, Involuntary orienting to sound improves visual perception. Nature, A. Glenberg, & A. Graesser (Eds.), Symbols and embodiment: 407, 906–908. Debates on meaning and cognition (pp. 245–283). Oxford: Oxford McGurk, H., & MacDonald, J. (1976). Hearing lips and seeing voices. University Press. Nature, 264, 746–748. Bates, E., D’Amico, S., Jacobsen, T., Székely, A., Andonova, E., Murray, M. M., Camen, C., Andino, S. L. G., Bovet, P., & Clarke, S. Devescovi, A., & Tzeng, O. (2003). Timed picture naming in seven (2006). Rapid brain discrimination of sounds of objects. Journal of languages. Psychonomic Bulletin & Review, 10, 344–380. Neuroscience, 26, 1293–1302. Boutonnet, B., & Lupyan, G. (2015). Words jump-start vision: A Nahorna, O., Berthommier, F., & Schwartz, J. L. (2012). Binding and label advantage in object recognition. Journal of Neuroscience, unbinding the auditory and visual streams in the McGurk effect. 35, 9329–9335. Journal of the Acoustical Society of America, 132, 1061–1077. Chen, Y.-C., & Spence, C. (2010). When hearing the bark helps to iden- Obleser, J., & Eisner, F. (2009). Pre-lexical abstraction of speech in the tify the dog: Semantically-congruent sounds modulate the identifi- auditory cortex. Trends in Cognitive Sciences, 13, 14–19. cation of masked pictures. Cognition, 114, 389–404. Saygin,A.P.,Dick,F.,&Bates,E.(2005).An on-linetaskfor Chen, Y.-C., & Spence, C. (2011). Crossmodal semantic priming by contrastingauditoryprocessing inthe verbal andnonverbal naturalistic sounds and spoken words enhances visual sensitivity. domains and norms for younger and older adults. Behavior Journal of Experimental Psychology: Human Perception and Research Methods, 37, 99–110. Performance, 37, 1554–1568. Snodgrass, J. G., & Vanderwart, M. (1980). A standardized set of 260 Chen, Y.-C., & Spence, C. (2013). The time-course of the cross-modal pictures: Norms for name agreement, image agreement, familiarity, semantic modulation of visual picture processing by naturalistic and visual complexity. Journal of Experimental Psychology: sounds and spoken words. Multisensory Research, 26, 371–386. Human Learning and Memory, 6, 174–215. Chen, Y.-C., & Spence, C. (2017). Comparing audiovisual semantic in- Snyder, J. S., & Gregg, M. K. (2011). Memory for sound, with an ear teractions between linguistic and non-linguistic stimuli. Manuscript toward hearing in complex auditory scenes. Attention, Perception, & submitted for publication. Psychophysics, 73, 1993–2007. Cummings, A., Čeponienė, R., Koyama, A., Saygin, A. P., Townsend, J., Soemer, A., & Saito, S. (2015). Maintenance of auditory-nonverbal in- & Dick, F. (2006). Auditory semantic networks for words and nat- formation in working memory. Psychonomic Bulletin & Review, 22, ural sounds. Brain Research, 1115, 92–107. 1777–1783. Edmiston, P., & Lupyan, G. (2015). What makes words special? Words as Van Petten, C., Coulson, S., Rubin, S., Plante, E., & Parks, M. (1999). unmotivated cues. Cognition, 143, 93–100. Time course of word identification and semantic integration in spo- Faivre, N., Mudrik, L., Schwartz, N., & Koch, C. (2014). Multisensory ken language. Journal of Experimental Psychology: Learning, integration in complete unawareness: Evidence from audiovisual Memory, and Cognition, 25, 394–417. congruency priming. Psychological Science, 25, 2006–2016. Glaser, W. R., & Glaser, M. O. (1989). Context effects in Stroop-like Waxman, S. R., & Gelman, S. A. (2009). Early word-learning entails word and picture processing. Journal of Experimental Psychology: reference, not merely associations. Trends in Cognitive Sciences, General, 118, 13–42. 13, 258–263. http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png Psychonomic Bulletin & Review Springer Journals

Dissociating the time courses of the cross-modal semantic priming effects elicited by naturalistic sounds and spoken words

Free
9 pages
Loading next page...
 
/lp/springer_journal/dissociating-the-time-courses-of-the-cross-modal-semantic-priming-CR7M68xI0B
Publisher
Springer US
Copyright
Copyright © 2017 by The Author(s)
Subject
Psychology; Cognitive Psychology
ISSN
1069-9384
eISSN
1531-5320
D.O.I.
10.3758/s13423-017-1324-6
Publisher site
See Article on Publisher Site

Abstract

Psychon Bull Rev (2018) 25:1138–1146 DOI 10.3758/s13423-017-1324-6 BRIEF REPORT Dissociating the time courses of the cross-modal semantic priming effects elicited by naturalistic sounds and spoken words 1 1 Yi-Chuan Chen & Charles Spence Published online: 9 June 2017 The Author(s) 2017. This article is an open access publication Abstract The present study compared the time courses of the enhances the sensitivity (d') of visual object detection cross-modal semantic priming effects elicited by naturalistic (Chen & Spence, 2011; Lupyan & Ward, 2013). Such re- sounds and spoken words on visual picture processing. sults suggest that the meaning of the auditory cue facilitates Following an auditory prime, a picture (or blank frame) was visual processing and boosts the breakthrough of the visual stimulus into awareness cross-modally rather than simply briefly presented and then immediately masked. The partici- pants had to judge whether or not a picture had been presented. giving rise to some sort of criterion change (note that the Naturalistic sounds consistently elicited a cross-modal seman- dog barking certainly induces a likely guess that the crea- tic priming effect on visual sensitivity (d') for pictures (higher ture might be a dog as well). d' in the congruent than in the incongruent condition) at the The time courses of cross-modal semantic priming effects, 350-ms rather than at the 1,000-ms stimulus onset asynchrony however, appear to be different for naturalistic sounds and (SOA). Spoken words mainly elicited a cross-modal semantic spoken words. Chen and Spence (2011) demonstrated that priming effect at the 1,000-ms rather than at the 350-ms SOA, when leading the target picture by 346 ms, only naturalistic but this effect was modulated by the order of testing these two sounds (rather than spoken words) elicited a semantic priming SOAs. It would therefore appear that visual picture processing effect on visual picture sensitivity in a simple detection task can be rapidly primed by naturalistic sounds via cross-modal (when judging whether a picture was present or not). These associations, and this effect is short lived. In contrast, spoken results were explained based on evidence suggesting that words prime visual picture processing over a wider range of naturalistic sounds access their associated meaning faster than prime-target intervals, though this effect was conditioned by spoken words do (Chen & Spence, 2013; Cummings et al., the prior context. 2006; Saygin, Dick, & Bates, 2005). The different processing times plausibly stem from the differing routes of semantic . . . access for each type of auditory stimulus: Naturalistic sounds Keywords Semantic Multisensory Audiovisual . access semantic information directly, whereas spoken words Sensitivity Priming have to access their meanings via lexical representations (Barsalou, Santos, Simmons, & Wilson, 2008;Chen& In daily life, hearing the sound of a dog barking is likely Spence, 2011; Glaser & Glaser, 1989). informative with regard to the identity of a creature that is Lupyan and colleagues, on the other hand, demonstrated an glimpsed, albeit briefly (Chen & Spence, 2010). Indeed, the advantage for spoken words over naturalistic sounds at longer presentation of either a naturalistic sound or spoken word SOAs (around 1,000 ms or more; Edmiston & Lupyan, 2015; Lupyan & Thompson-Schill, 2012 ). The participants in their studies had to verify whether the auditory cue (either a * Yi-Chuan Chen yi-chuan.chen@psy.ox.ac.uk The interstimulus interval was 400, 1000, or 1500 ms in Lupyan and Thompson-Schill (2012). However, since they did not report the duration of Crossmodal Research Laboratory, Department of Experimental the auditory cue, it is not possible to determine the corresponding SOAs. Psychology, University of Oxford, 9 South Parks Road, Oxford OX1 Nevertheless, the SOAs were certainly longer than the 346 ms in Chen and 3UD, UK Spence (2011). Psychon Bull Rev (2018) 25:1138–1146 1139 naturalistic sound or a spoken word) and the subsequently of the study. The study was approved by the Medical presented picture matched or not. The results demonstrate that Sciences Inter Divisional Research Ethics Committee, the participants’ reaction times (RTs) were shorter for spoken University of Oxford (MSD-IDREC-C1-2014-143). words than for naturalistic sounds. Further evidence comes from an event-related potential (ERP) study: When a spoken Apparatus and stimuli word led a target picture by around 1,670 ms, the P1 compo- nent associated with the picture (at 70–125 ms after onset) The visual stimuli were presented on a 23-inch LED monitor occurred earlier in the congruent than in the incongruent con- controlled by a personal computer. The participants sat at a dition, but no such congruency effect was induced by natural- viewing distance of 58 cm from the monitor in a dimly lit istic sounds (Boutonnet & Lupyan, 2015). These results were chamber. Twenty-four outline-drawings (12 living and 12 explained in terms of spoken words being associated with nonliving things) taken from Snodgrass and Vanderwart semantic representations that are more abstract and categori- (1980) and Bates et al. (2003), as well as their mirror images, cal, thus providing a conceptual cue regarding a given object were used as visual targets (see Appendix). Five pattern masks that is general rather than specific to a particular exemplar, as were created by overlapping 20 nonobject figures randomly compared to naturalistic sounds (Edmiston & Lupyan, 2015; selected from Magnié, Besson, Poncet, and Dolisi (2003). Lupyan & Thompson-Schill, 2012). Each pattern covered an area of 5.9° × 5.9°, sufficient to Given the different SOAs and methods, and given the dif- completely occlude all of the target pictures. ferent mechanisms proposed by previous research (Chen & The auditory stimuli (8 bit mono; 22500 Hz digitization) Spence, 2011; Edmiston & Lupyan, 2015; Lupyan & were presented over closed-ear headphones and ranged in Thompson-Schill, 2012), we wanted to carefully examine loudness from 31 to 51 dB sound pressure level (SPL). The the time courses of cross-modal semantic priming effects elic- naturalistic sounds were those produced by each of the ob- ited by naturalistic sounds and spoken words. Two critical jects. The spoken words consisted of the most commonly SOAs were chosen: The 350- ms SOA is close to the interval agreed-upon name used to refer each picture (Bates et al. at which Chen and Spence (2011) demonstrated cross-modal 2003; Snodgrass & Vanderwart, 1980) and were produced semantic priming by naturalistic sounds (but not by spoken by a female native English speaker. The naturalistic sound words) in a picture detection task. The 1,000-ms SOA (the and the spoken word associated with the same picture were interstimulus interval, ISI, was 500–650 ms) corresponds to edited to have the same duration. The root mean square values the ISI somewhere between 400 and 1,000 ms used by of all of the auditory stimuli were equalized. Lupyan and Thompson-Schill (2012), the conditions demon- strated a cross-modal semantic advantage for spoken words Design over naturalistic sounds. In Experiment 1, each participant was tested with only one of the SOAs, following the designs Two within-participants factors, prime type (naturalistic sound of Chen and Spence (2011) and Lupyan and Thompson-Schill or spoken word) and congruency (congruent or incongruent), (2012). In Experiment 2, participants were tested with both and one between-participants factor, SOA (350 or 1,000 ms), SOAs in a counterbalanced order. In this case, we further were manipulated. Naturalistic sounds and spoken words were examined whether the time courses of cross-modal semantic presented in separate blocks of trials. Congruent and incon- priming effects are stable or modulated by prior context. gruent trials were mixed within blocks: The auditory cue matched the picture in the congruent trials, but they belonged to different categories based on the fundamental living thing Experiment 1 versus nonliving thing separation in the incongruent trials. Each SOA was tested with 20 participants. Method All 24 pictures and their mirror images were presented once in each block—either one was presented in the congruent trial Participants and the other in the incongruent trial (and they were swapped in another block). These trials were used to estimate the par- Forty volunteers (10 males, mean age 22.2 years) took part ticipant’s hit rate in the congruent and incongruent conditions, in this experiment in exchange for course credit or five respectively. An additional 48 picture-absent trials, consisting pounds (UK sterling). The participants were native of an auditory cue and a blank frame, were presented to esti- English speakers or bilinguals who had started to learn mate the participant’s false alarm (FA) rate. These 96 trials English by 5 years of age. All participants had normal or were presented in a completely randomized order. There were corrected-to-normal vision and normal hearing by self-re- two blocks for both naturalistic sounds and spoken words, and port, and all were naïve as to the purpose of the study. the order of these two types of auditory stimuli was counterbalanced across participants. The participants were Written informed consent was obtained prior to the start 1140 Psychon Bull Rev (2018) 25:1138–1146 not given any information concerning the possible semantic Prior to the start of the main experiment, all of the pictures congruency between the auditory cue and picture prior to and their matched names were presented on the monitor in a taking part in the study. completely randomized order across participants. Each picture-name pair was presented for 1,500 ms and interleaved by a blank frame for 500 ms. An easy practice session (eight Procedure trials with a picture duration of 33 ms) and a harder practice session (16 trials with a picture duration of 17 ms) were con- The participants initiated a block of trials by pressing the enter ducted prior to the main experiment. In the easy practice ses- key on the keyboard in front of them. In each trial (see sion, the accuracy had to reach 85%, or it was repeated up to Fig. 1a), a blank frame was followed by either a frame with three times. The stimuli in the practice session were not used a picture or another blank for 17 ms (one frame at the screen in the main experiment. The experiment lasted for approxi- refresh rate of 60 Hz). The pattern mask was presented imme- mately 30 minutes. diately thereafter; meanwhile, the participants had to decide whether they had seen a picture (irrespective of its identity) presented before the mask by pressing the space bar. Results The participants were informed that the task was not speeded, and they should only respond if they were sure For both naturalistic sounds and spoken words, the hit rate in that they had seen a picture (i.e., they should maintain a the congruent and incongruent conditions was estimated on strict response criterion). the basis of 48 trials (24 pictures × 2 blocks), while the FA rate (A) Procedure Blank 1,483 ms Target 17 ms Mask 1,500 ms (B) Design based on signal detecon theory Internal response incongruent congruent Fig. 1 a Sequence of three frames presented in each trial: A blank, a conditions, respectively. In this design, the congruent and incongruent target picture (e.g., a dog), and a pattern mask. The target picture and conditions share the same FA rate. The sensitivity (d') was calculated pattern mask were presented in black in the center of a white background. using the equations: d' = z(hit rate) – z(FA rate) in the congruent and b Represents the current experimental design in terms of signal detection incongruent conditions, separately (Green & Swets, 1966;Macmillan & theory. The distributions of dashed, dotted,and solid lines represent target Creelman, 2005) present/congruent, target present/incongruent, and target absent Probability Psychon Bull Rev (2018) 25:1138–1146 1141 was estimated on the basis of 96 trials (48 picture- submitted to a four-way ANOVA (see Table 4afor the absent trials × 2 blocks; see Table 1); d' values were calculated results). There was a significant three-way interaction be- based on the hit and FA rate (see Figs. 1b–2), and then sub- tween congruency, prime type, and order. Two separate mitted to a three-way analysis of variance (ANOVA) with the two-way ANOVAs for each prime type with the factors factors of congruency, prime type, and SOA (see Table 2 of congruency and order demonstrated that the congruency for results). Critically, there was a significant three-way effect was significant for naturalistic sounds, F(1, 34) = interaction. Paired t tests (Holm-Bonferroni correction, 7.50, p <.05, η = 0.18, without being modulated by order one-tailed were used because higher d' in the congruent (Congruency × Order: F <1, p =.87, η = 0.001). than the incongruent condition was expected) demonstrat- However, for spoken words, the congruency effect was ed the congruency effect by naturalistic sounds at the 350- modulated by order (Congruency × Order): F(1, 34) = ms SOA, t(19) = 2.81, p < .05, but not at the 1,000-ms 12.41, p < .005, η = 0.27. Post hoc tests demonstrated SOA, t(19) = -1.63, p = .12; in contrast, the congru- that the congruency effect by spoken words was significant ency effect by spoken words occurred at the 1,000-ms in Group 2 (1,000–350 ms), t(17) = 6.18, p < .001, but not SOA, t(19) = 2.87, p < .05, but not at the 350-ms in Group 1 (350–1,000 ms), t(17) = 0.65, p =.53.These SOA, t(19) = 0.32, p = .75. We therefore replicated results therefore suggest that the SOA order influenced the the results at the 350-ms SOA reported in Chen and cross-modal semantic congruency effect elicited by spoken Spence (2011). words but not by naturalistic sounds. Such a carryover effect of the SOA from one session to the next may mask the modulation of SOA on the congruency effect from au- ditory cues. The data from the two sessions were therefore Experiment 2 analyzed separately. When only including the data from the first session (top Method row in Fig. 3), a three-way ANOVA with the factors of congruency, prime type, and SOA (with SOA as a Thirty-six volunteers (seven males, mean age 19.7 years) between-participant factor) was conducted (see Table 4b). took part in this experiment. Three factors were tested with This is the same design as in Experiment 1, and the results all participants: prime type (naturalistic sound or spoken were replicated: The three-way interaction was significant. word), congruency (congruent or incongruent), and SOA Paired t tests demonstrated that the congruency effect by (350 or 1,000 ms). The fourth factor, the order in which the naturalistic sounds was only observed at the 350-ms SOA, SOAs were tested, was manipulated between participants: t(17) = 2.50, p < .05, but not at the 1,000-ms SOA, t(17) = - Half of the participants were tested with the 350-ms SOA 0.27, p = .79. In contrast, the congruency effect by spoken in the first session and the 1,000-ms SOA in the second words was only statistically significant at the 1,000-ms session (Group 1: 350–1,000 ms); the order was reversed SOA, t(17) = 3.94, p < .005, but not at the 350-ms SOA, for the remainder of the participants (Group 2: 1,000–350 t(17) = 0.50, p =.62. ms). The stimuli and task were the same as in Experiment The results of the second session (bottom row in 1. The experiment took an hour to complete. Fig. 3) were different from the first session (see Table 4c). The significant interaction between congruen- Results cyandSOA wasattributedtothe fact that thecongru- ency effect was only significant at the 350-ms SOA, The participant’s d' (see Fig. 3) was calculated based on the t(17) = 4.69, p < .001, but not at the 1,000-ms SOA, hit and FA rates in each condition (see Table 3), and then t(17) = 0.16, p = .87. Planned comparisons demon- strated that the congruency effect was significant at the 350-ms SOA for both naturalistic sounds, t(17) = 2.98, Table 1 Percentage of hit and false alarm (FA) rates (SE in parentheses) p < .05, and spoken words, t(17) = 4.30, p < .001, but in each of the conditions in Experiment 1 for neither at the 1,000-ms SOA, t(17) = -0.57, p =.58, SOA (ms) Sound type Hit rate FA rate and t(17) = 0.65, p = .53, respectively. The significant interaction between congruency and prime type reflected Congruent Incongruent the congruency effect being significant for spoken words, t(35) = 3.48, p < .005, but only marginally sig- 350 Naturalistic sound 83.4 (3.8) 79.1 (4.0) 7.1 (2.6) nificant for naturalistic sounds, t(35) = 1.88, p =.07. Spoken word 84.0 (3.6) 81.9 (4.6) 6.7 (2.1) The latter perhaps results from the slightly higher d' in 1,000 Naturalistic sound 65.3 (3.7) 68.3 (4.2) 10.3 (4.0) the incongruent than in the congruent condition at the Spoken word 73.8 (3.5) 67.3 (3.9) 12.4 (3.6) 1,000-ms SOA. 1142 Psychon Bull Rev (2018) 25:1138–1146 Congruent Incongruent Sound Word Sound Word 350 ms 1000 ms Fig. 2 Mean sensitivity (d') at the 350- and 1,000-ms SOAs in Experiment 1. Error bars indicate ±1 SEM. Sound = naturalistic sounds; Word = spoken words General discussion meaning for the former is shorter (Chen & Spence, 2013; Cummings et al., 2006; Saygin et al., 2005). Consistent evi- The results of the present study demonstrate that the pre- dence comes from the results of ERPs studies: For instance, sentation of naturalistic sounds enhanced the visual sensi- Murray, Camen, Andino, Bovet, and Clarke (2006) have dem- tivity of semantically congruent pictures at the shorter SOA onstrated that the brain activities associated with naturalistic (350 ms) than spoken words did (1,000 ms) when each sounds produced by living versus nonliving things can be participant just encountered either one of the SOAs. The discriminated around 70 ms to 119 ms after sound onset. cross-modal semantic priming effects elicited by the pre- The component associated with the meaning of spoken words sentation of naturalistic sounds versus spoken words can (the N400; Kutas & Hillyard, 1980), on the other hand, typi- therefore be dissociated in terms of their differing time cally starts 200 ms after word onset, and it could be delayed if courses. Furthermore, naturalistic sounds consistently the word is longer or else shares initial syllables with other primed the visual pictures at the short SOA; in contrast, words (van Petten, Coulson, Rubin, Plante, & Parks, 1999). the priming effect elicited by spoken words was signifi- These results may partly be attributed to the nature of the cantly modulated by the SOA tested beforehand. acoustic signals that are associated with each type of stimulus: Specifically, when the 1,000-ms SOA (demonstrating a sig- Naturalistic sounds associated with different object categories nificant priming effect) was tested first, the priming effect have distinct time-frequency spectrums from each other (e.g., carried over to the 350-ms SOA (this was not observed in Murray et al., 2006). By contrast, spoken words become com- Experiment 1). However, when the 350-ms SOA (where no prehensible when the acoustic signals are abstracted into var- priming effect was observed) was tested first, the priming ious phonetic representations, and the latter are then used to effect at the 1,000-ms SOA was eliminated as well. Finally, access their associated lexical representations (Obleser & higher sensitivity in the 350-ms than in the 1,000-ms SOA Eisner, 2009). Consequently, a semantic network suggests that was observed in both experiments. This can be explained naturalistic sounds and visual pictures access semantics direct- by an attentional cuing effect elicited by the presentation of ly, whereas spoken words access their meanings via lexical a temporally close auditory cue (McDonald, Teder- representations (Chen & Spence, 2011, 2017;Glaser& Sälejärvi, & Hillyard, 2000). Glaser, 1989). Hence, the cross-modal semantic interactions That naturalistic sounds elicited the cross-modal seman- between naturalistic sounds and pictures would be expected to tic priming effect faster (i.e., at the shorter SOA) than spo- occur more rapidly than between spoken words and pictures, ken words did suggests that the time required to access as demonstrated in the present study. Table 2 Results of analysis of sensitivity (d') in Experiment 1 (three-way ANOVA: Congruency × Prime Type × SOA) Effect F(1, 38) p η Note Congruency 5.92 <.05 0.14 Congruent (2.58) > Incongruent (2.50) SOA 11.22 <.005 0.23 350 ms (2.99) > 1,000 ms (2.10) Congruency × Prime Type × SOA 9.99 <.005 0.21 Psychon Bull Rev (2018) 25:1138–1146 1143 Group 1 Group 2 st session Sound Word Sound Word 350 ms 1000 ms Congruent nd Incongruent session Sound Word Sound Word 350 ms 1000 ms Fig. 3 Mean sensitivity (d') at the 350- and 1,000-ms SOAs for Group 1 (tested in the order 350- and then 1,000-ms SOA) and Group 2 (with the order reversed) in Experiment 2. Error bars indicate ±1 SEM. Sound = naturalistic sounds; Word = spoken words At the 1,000-ms SOA, only spoken words but not natu- the phonological loop (Snyder & Gregg, 2011;Soemer& ralistic sounds gave rise to cross-modal semantic priming Saito, 2015), and both processes take extra time or cogni- effects, thus suggesting that the effect induced by natural- tive resources. In contrast, spoken words essentially have istic sounds is short-lived (see also Chen & Spence, 2017; the benefit of being maintained in the phonological loop in Kim, Porter, & Goolkasian, 2014, when using the picture the working memory system (Baddeley, 2012), thus lead- categorization task). Given that naturalistic sounds can ac- ing to the significant priming effect over a greater range of cess their meaning rapidly (within 350 ms in the current SOAs than naturalistic sounds (current study; Chen & study), the short-lived priming effect suggests that the ac- Spence, 2017). tivated meaning would be forgotten rapidly as well, unless The final contrast lies in the fact that the time course the information can be temporally maintained. The main- of the cross-modal semantic priming effect by naturalis- tenance of representations of naturalistic sounds, neverthe- tic sounds was stable, whereas that elicited by spoken less, is underpinned by the auditory imagery capability, or words was modulated by the prior context (i.e., the else by being transferred into lexical codes and stored in order in which the SOAs were tested). Audiovisual Table 3 Percentage of hit and false alarm (FA) rates (SE in parentheses) in each of the conditions in Experiment 2 SOA (ms) Sound type Hit rate FA rate Congruent Incongruent Group 1 350 (first session) Naturalistic sound 88.0 (2.3) 81.0 (4.3) 3.2 (1.4) Spoken word 82.6 (3.8) 81.4 (4.0) 2.2 (0.6) 1,000 (second session) Naturalistic sound 71.9 (3.7) 72.3 (4.1) 2.5 (0.9) Spoken word 71.4 (4.4) 69.0 (5.2) 1.4 (0.4) Group 2 350 (second session) Naturalistic sound 80.7 (4.0) 75.8 (5.1) 5.6 (3.3) Spoken word 79.2 (3.8) 67.4 (5.4) 3.5 (2.2) 1,000 (first session) Naturalistic sound 69.4 (4.0) 69.7 (4.5) 4.7 (2.2) Spoken word 71.1 (4.9) 64.0 (5.9) 5.5 (2.3) d' d' 1144 Psychon Bull Rev (2018) 25:1138–1146 Table 4 Results of analysis of sensitivity (d') in Experiment 2 Effect F(1, 34) p η Note (A) Four-way ANOVA (Congruency × Prime Type × SOA × Order) Congruency 17.85 <.001 0.34 Congruent (2.99) > Incongruent (2.85) SOA 45.32 <.001 0.57 350 ms (3.13) > 1,000 ms (2.70) Congruency × SOA 10.15 <.005 0.23 Congruency × Prime Type 6.46 <.05 0.16 Congruency × Order 4.49 <.05 0.12 Congruency × Prime Type × Order 16.95 <.001 0.33 (B) First session: three-way ANOVA (Congruency × Prime Type × SOA) Congruency 8.85 <.01 0.21 Congruent (2.98) > Incongruent (2.85) SOA 6.65 <.05 0.16 350 ms (3.26) > 1,000 ms (2.58) Congruency × Prime Type × SOA 12.24 <.005 0.27 (C) Second session: three-way ANOVA (Congruency × Prime Type × SOA) Congruency 15.83 <.001 0.32 Congruent (2.99) > Incongruent (2.84) Congruency × SOA 14.41 <.005 0.30 Congruency × Prime Type 4.96 <.05 0.13 integration/interactions involving speech sounds have interval should result from the former being better main- been demonstrated to be flexible. For example, the tained in working memory. Finally, consistent with previ- cross-modal semantic priming effect by spoken words ous studies, interactions between spoken words and visual can be speeded up so as to be observed at around signals are flexible—that is, they can be enhanced or 350-ms SOA if the participants have been exposed to inhibited by prior context or by task demands. the longer SOA condition (the current study) or if the participants have to identify the target picture by Acknowledgements The authors are supported by the Arts and reporting its name (Chen & Spence, 2011). In addition, Humanities Research Council (AHRC), Rethinking the Senses grant (AH/L007053/1). the integration of verbal cues and visual lip movements occurs more often (indexed by a larger McGurk effect; McGurk & McDonald, 1976) if a series of congruent (compared to incongruent) audiovisual speech stimuli Appendix were presented beforehand (Nahorna, Berthommier, & Schwartz, 2012). Finally, the digits and letters that are The auditory stimuli used in the present study. Note presented subliminally to both vision and audition that the lengths of the naturalistic sound and spoken would be integrated only if the participants had con- word referring to the same picture were matched at sciously experienced these pairings prior to the test 350 ms for the 14 one-syllable words, 450 ms for the (Faivre, Mudrik, Schwartz, & Koch, 2014). The higher seven two-syllable words, and 500 ms for the three- flexibility of audiovisual interactions involving spoken (or more) syllable words (three words). The identi- words than naturalistic sounds at the semantic level perhaps fication accuracy, confidence, and familiarity ratings stems from the former accessing the semantic representations (maximum score = 7) for the sounds reflect the mean at an abstract, categorical, and modality-insensitive level, performance of 18 participants (four males, mean age whereas the latter served as modality-specific and context- 28 years, reported in Chen & Spence, 2011). The rat- dependent attributes associated with the object cross- ings of imagery concordance (maximum score = 5) modally (Edmiston & Lupyan, 2015; Waxman & Gelman, were acquired via an online study (30 participants for 2009). naturalistic sounds, 18 males, mean age 31 years; 30 Together, the results of the two experiments reported participants for spoken words, 19 males, mean age 33 here demonstrate that naturalistic sounds elicit more rapid years). All scores were lower for naturalistic sounds cross-modal priming than do spoken words, which is likely than for spoken words. Identification accuracy: t(23) = 5.42, determined by their speed of semantic access stemming p < .001; confidence rating: t(25) = 8.33, p < .001; familiarity from the different processing routes. On the other hand, rating: t(26) = 8.38, p < .001; imagery concordance: the advantage of spoken words over naturalistic sounds to t(24) = 4.24, p < .001; two-tailed, unequal variance prime visual pictures across a more prolonged prime-target assumed Psychon Bull Rev (2018) 25:1138–1146 1145 Naturalistic Spoken sounds words Picture(Spoken Naturalistic Duration Identification Mean Mean Mean imagery Identification Mean confidence Mean Mean imagery word) sound (ms) accuracy (%) confidence familiarity concordance accuracy (%) rating familiarity concordance rating rating rating Bird Bird chirping 350 77.8 4.7 4.8 4.4 100 6.9 6.9 4.7 Cat “Meow” 350 100 6.2 6.3 4.4 100 7.0 7.0 4.8 Cow “Moo” 350 88.9 6.4 6.3 4.8 100 6.9 7.0 4.8 Dog “Woof” 350 94.4 6.4 6.4 4.3 100 6.9 6.9 4.8 Duck “Quack” 350 100 6.8 6.3 4.5 100 6.9 6.9 4.8 Eagle Eagle call 450 55.6 4.9 4.6 4.5 100 6.8 6.9 4.8 Elephant Roar 500 61.1 4.8 4.6 4.7 100 6.9 6.9 4.9 Frog “Ribbit” 350 83.3 5.1 5.1 4.2 100 7.0 7.0 4.9 Goat “Baa-baa” 350 72.2 5.6 5.4 4.4 100 6.9 6.7 4.9 Horse Horse neighing 350 61.1 5.2 4.9 4.0 94 6.2 6.7 4.9 Pig “Oink” 350 50.0 5.2 5.4 4.2 100 6.9 7.0 4.9 Rooster Crowing sound 450 88.9 5.8 5.6 4.7 100 6.9 6.2 4.8 Car Car starting 350 77.8 5.4 5.8 4.0 100 7.0 7.0 4.6 Door Creak 350 88.9 4.8 5.2 4.3 100 6.9 6.8 4.7 Drum Banging of a 350 94.4 5.8 5.8 4.3 100 7.0 6.8 4.8 drum Guitar Guitar sound 450 88.9 6.0 6.4 4.8 100 7.0 6.9 4.9 Gun Gunshot 350 61.1 4.3 4.1 3.6 100 6.9 6.9 4.8 Motorcycle “Vroom-vroom” 500 33.3 4.8 5.1 3.4 100 6.9 6.7 4.7 Piano Piano sound 450 94.4 6.1 6.2 4.8 100 7.0 6.9 4.7 Scissors Clipping 450 22.2 3.2 3.8 3.7 100 6.9 6.9 4.9 scissors Switch Click 350 27.8 3.8 4.2 2.0 100 6.6 6.5 4.6 Telephone Telephone 500 100 6.6 6.7 4.7 100 7.0 7.0 4.6 ringing Trumpet Trumpet sound 450 83.3 5.3 5.3 4.6 100 7.0 6.7 4.9 Whistle Whistling 450 66.7 5.4 5.5 4.8 94.4 6.8 6.6 4.9 Mean (SEM) 60.9 (7.9) 5.2 (0.2) 5.2 (0.2) 4.3 (0.1) 98.7 (0.9) 6.9 (0.1) 6.8 (0.1) 4.8 (0.02) 1146 Psychon Bull Rev (2018) 25:1138–1146 Green, D. M., & Swets, J. A. (1966). Signal detection theory and The stimuli used in the practice sessions were fly (hum- psychophysics. New York: Wiley. ming fly), tiger (tiger roaring), bell (bell ringing), and cannon Kim, Y., Porter, A. M., & Goolkasian, P. (2014). Conceptual priming with (cannon fire) pictures and environmental sounds. Acta Psychologica, 146, 73–83. Kutas, M., & Hillyard, S. A. (1980). Reading senseless sentences: Brain Open Access This article is distributed under the terms of the Creative potentials reflect semantic incongruity. Science, 207, 203–205. Commons Attribution 4.0 International License (http:// Lupyan, G., & Thompson-Schill, S. L. (2012). The evocative power of creativecommons.org/licenses/by/4.0/), which permits unrestricted use, words: Activation of concepts by verbal and nonverbal means. distribution, and reproduction in any medium, provided you give appro- Journal of Experimental Psychology: General, 141, 170–186. priate credit to the original author(s) and the source, provide a link to the Lupyan, G., & Ward, E. J. (2013). Language can boost otherwise unseen Creative Commons license, and indicate if changes were made. objects into visual awareness. Proceedings of the National Academy of Sciences of the United States of America, 110, 14196–14201. Macmillan, N. A., & Creelman, C. D. (2005). Detection theory: A user’s guide. Mahwah, NJ: Erlbaum. Magnié, M. N., Besson, M., Poncet, M., & Dolisi, C. (2003). The References Snodgrass and Vanderwart set revisited: Norms for object manipu- lability and for pictorial ambiguity of objects, chimeric objects, and Baddeley, A. (2012). Working memory: Theories, models, and contro- nonobjects. Journal of Clinical and Experimental Neuropsychology, versies. Annual Review of Psychology, 63, 1–29. 25, 521–560. Barsalou, L. W., Santos, A., Simmons, W. K., & Wilson, C. D. (2008). McDonald, J. J., Teder-Sälejärvi, W. A., & Hillyard, S. A. (2000). Language and simulation in conceptual processing. In M. de Vega, Involuntary orienting to sound improves visual perception. Nature, A. Glenberg, & A. Graesser (Eds.), Symbols and embodiment: 407, 906–908. Debates on meaning and cognition (pp. 245–283). Oxford: Oxford McGurk, H., & MacDonald, J. (1976). Hearing lips and seeing voices. University Press. Nature, 264, 746–748. Bates, E., D’Amico, S., Jacobsen, T., Székely, A., Andonova, E., Murray, M. M., Camen, C., Andino, S. L. G., Bovet, P., & Clarke, S. Devescovi, A., & Tzeng, O. (2003). Timed picture naming in seven (2006). Rapid brain discrimination of sounds of objects. Journal of languages. Psychonomic Bulletin & Review, 10, 344–380. Neuroscience, 26, 1293–1302. Boutonnet, B., & Lupyan, G. (2015). Words jump-start vision: A Nahorna, O., Berthommier, F., & Schwartz, J. L. (2012). Binding and label advantage in object recognition. Journal of Neuroscience, unbinding the auditory and visual streams in the McGurk effect. 35, 9329–9335. Journal of the Acoustical Society of America, 132, 1061–1077. Chen, Y.-C., & Spence, C. (2010). When hearing the bark helps to iden- Obleser, J., & Eisner, F. (2009). Pre-lexical abstraction of speech in the tify the dog: Semantically-congruent sounds modulate the identifi- auditory cortex. Trends in Cognitive Sciences, 13, 14–19. cation of masked pictures. Cognition, 114, 389–404. Saygin,A.P.,Dick,F.,&Bates,E.(2005).An on-linetaskfor Chen, Y.-C., & Spence, C. (2011). Crossmodal semantic priming by contrastingauditoryprocessing inthe verbal andnonverbal naturalistic sounds and spoken words enhances visual sensitivity. domains and norms for younger and older adults. Behavior Journal of Experimental Psychology: Human Perception and Research Methods, 37, 99–110. Performance, 37, 1554–1568. Snodgrass, J. G., & Vanderwart, M. (1980). A standardized set of 260 Chen, Y.-C., & Spence, C. (2013). The time-course of the cross-modal pictures: Norms for name agreement, image agreement, familiarity, semantic modulation of visual picture processing by naturalistic and visual complexity. Journal of Experimental Psychology: sounds and spoken words. Multisensory Research, 26, 371–386. Human Learning and Memory, 6, 174–215. Chen, Y.-C., & Spence, C. (2017). Comparing audiovisual semantic in- Snyder, J. S., & Gregg, M. K. (2011). Memory for sound, with an ear teractions between linguistic and non-linguistic stimuli. Manuscript toward hearing in complex auditory scenes. Attention, Perception, & submitted for publication. Psychophysics, 73, 1993–2007. Cummings, A., Čeponienė, R., Koyama, A., Saygin, A. P., Townsend, J., Soemer, A., & Saito, S. (2015). Maintenance of auditory-nonverbal in- & Dick, F. (2006). Auditory semantic networks for words and nat- formation in working memory. Psychonomic Bulletin & Review, 22, ural sounds. Brain Research, 1115, 92–107. 1777–1783. Edmiston, P., & Lupyan, G. (2015). What makes words special? Words as Van Petten, C., Coulson, S., Rubin, S., Plante, E., & Parks, M. (1999). unmotivated cues. Cognition, 143, 93–100. Time course of word identification and semantic integration in spo- Faivre, N., Mudrik, L., Schwartz, N., & Koch, C. (2014). Multisensory ken language. Journal of Experimental Psychology: Learning, integration in complete unawareness: Evidence from audiovisual Memory, and Cognition, 25, 394–417. congruency priming. Psychological Science, 25, 2006–2016. Glaser, W. R., & Glaser, M. O. (1989). Context effects in Stroop-like Waxman, S. R., & Gelman, S. A. (2009). Early word-learning entails word and picture processing. Journal of Experimental Psychology: reference, not merely associations. Trends in Cognitive Sciences, General, 118, 13–42. 13, 258–263.

Journal

Psychonomic Bulletin & ReviewSpringer Journals

Published: Jun 9, 2017

References

You’re reading a free preview. Subscribe to read the entire article.


DeepDyve is your
personal research library

It’s your single place to instantly
discover and read the research
that matters to you.

Enjoy affordable access to
over 18 million articles from more than
15,000 peer-reviewed journals.

All for just $49/month

Explore the DeepDyve Library

Search

Query the DeepDyve database, plus search all of PubMed and Google Scholar seamlessly

Organize

Save any article or search result from DeepDyve, PubMed, and Google Scholar... all in one place.

Access

Get unlimited, online access to over 18 million full-text articles from more than 15,000 scientific journals.

Your journals are on DeepDyve

Read from thousands of the leading scholarly journals from SpringerNature, Elsevier, Wiley-Blackwell, Oxford University Press and more.

All the latest content is available, no embargo periods.

See the journals in your area

DeepDyve

Freelancer

DeepDyve

Pro

Price

FREE

$49/month
$360/year

Save searches from
Google Scholar,
PubMed

Create lists to
organize your research

Export lists, citations

Read DeepDyve articles

Abstract access only

Unlimited access to over
18 million full-text articles

Print

20 pages / month

PDF Discount

20% off