TY - JOUR AU - Feng, Wenfeng AB - Abstract The present study recorded event-related potentials (ERPs) in a visual object-recognition task under the attentional blink paradigm to explore the temporal dynamics of the cross-modal boost on attentional blink and whether this auditory benefit would be modulated by semantic congruency between T2 and the simultaneous sound. Behaviorally, the present study showed that not only a semantically congruent but also a semantically incongruent sound improved T2 discrimination during the attentional blink interval, whereas the enhancement was larger for the congruent sound. The ERP results revealed that the behavioral improvements induced by both the semantically congruent and incongruent sounds were closely associated with an early cross-modal interaction on the occipital N195 (192–228 ms). In contrast, the lower T2 accuracy for the incongruent than congruent condition was accompanied by a larger late occurring cento-parietal N440 (424–448 ms). These findings suggest that the cross-modal boost on attentional blink is hierarchical: the task-irrelevant but simultaneous sound, irrespective of its semantic relevance, firstly enables T2 to escape the attentional blink via cross-modally strengthening the early stage of visual object-recognition processing, whereas the semantic conflict of the sound begins to interfere with visual awareness only at a later stage when the representation of visual object is extracted. attentional blink, cross-modal interaction, ERPs, semantic congruency Introduction In everyday life, a considerable number of higher cognitive functions, such as learning and memory, are dependent on the effective integration of information from more than one sensory modality. The last two decades have witnessed a sharp increase in the understanding of the psychological and physiological mechanisms of multisensory integration. Within this context, the interplay between attention and multisensory integration has become a recent research focus (for reviews, see Talsma et al. 2010; Talsma 2015; De Meo et al. 2015; Tang et al. 2016). On the one hand, multisensory processing has been shown to be modulated by various forms of attention, such as spatial attention (Busse et al. 2005; Talsma and Woldorff 2005; Mishra et al. 2010; Zimmer et al. 2010), modality-based attention (Talsma et al. 2007; Mishra and Gazzaley 2012, 2013), object-based attention (Molholm et al. 2004, 2007; Fiebelkorn et al. 2010), and attentional resource as well (Michail and Keil 2018). On the other hand, previous studies have also revealed that multisensory integration can conversely affect the allocation of attention, such as the well-known “pip and pop” effect (Van der Burg et al. 2008, 2011). Another striking but largely overlooked example in the literature is the cross-modal boost on attentional blink (Olivers and Van der Burg 2008; but see Kranczioch and Thorne 2013 for a different opinion). Specifically, if two successive targets that both have to be discriminated are embedded in a rapid stream of visual stimuli, observers often fail to discriminate or even detect the second (T2) when it is temporally close to the first (T1), which was termed the attentional blink (Raymond et al. 1992). The cross-modal boost on attentional blink refers to that a task-irrelevant, noninformative sound presented synchronously with T2 could substantially reduce the magnitude of attentional blink, indexed by the improved T2 discrimination (Olivers and Van der Burg 2008; Kranczioch and Thorne 2013). Although the influential empirical evidences and theories about the attentional blink per se suggest that the T2 deficit reflects an impairment in the postperceptual stage of processing (e.g., working memory) (Shapiro and Raymond 1994; Chun and Potter 1995; Vogel et al. 1998; Sergent et al. 2005; Di Lollo et al. 2005; Olivers et al. 2007; for review, see Dux and Marois 2009), the cross-modal boost on attentional blink (i.e., the effect of sound on reducing the T2 deficit) has been shown to occur in early perceptual stage of processing (Kranczioch and Thorne 2015). However, since previous studies investigating the cross-modal boost on attentional blink typically employed nonspecific, meaningless sounds (e.g., pure tones) rather than natural sounds in environment (e.g., barks of dogs) as the auditory cues (Olivers and Van der Burg 2008; Kranczioch and Thorne 2013, 2015), the higher order audiovisual integration based on the semantic relevance between visual and auditory signals that may also contribute to the cross-modal boost on attentional blink is still poorly understood. There is a growing number of studies beginning to investigate the effect of audiovisual semantic congruency on multisensory integration, given the fact that integrating efficiently the multifarious multisensory information in real life depends not only on the temporal and spatial co-occurrences of signals from different modalities, but also on the preexisting, highly learned associations among well-known multisensory objects. In spite of the differences in experimental paradigms, previous studies have found that compared with the semantically congruent multisensory objects (e.g., barks with dogs), the incongruent objects (e.g., barks with cats) lead to slower reaction times (Laurienti et al. 2004; Suied et al. 2009; Yuval-Greenberg and Deouell 2009; Chen and Spence 2013, 2018), lower accuracy (Iordanescu et al. 2008; Chen and Spence 2010), larger conflict-related N2 and N400 components (Molholm et al. 2004; Zimmer et al. 2010; Kang et al. 2018) and activations in dorsolateral prefrontal and anterior cingulate cortices (Weissman et al. 2004, 2009), as well as smaller occipital N1 component (Molholm et al. 2004) and nonphase-locked gamma-band response (Yuval-Greenberg and Deouell 2007). In terms of the cross-sensory boost on attentional blink, however, it is currently unclear whether and when this auditory benefit on visual attention would be modulated by semantic congruency between T2 and the coincident sound. Although a recent behavioral study (Adam and Noppeney 2014) has shown that the semantically congruent relative to incongruent sounds improved T2 discrimination during the attentional blink interval, the neural mechanisms underlying the audiovisual congruency effect on attentional blink remain unclear. In addition, because the semantically incongruent audiovisual condition was compared directly with the congruent condition rather than being compared with a unisensory visual T2 condition (i.e., T2 without auditory stimulus) in the previous study, it is still unknown whether a semantically incongruent sound would also enhance or, instead, impair T2 discrimination. The present study investigated the above-mentioned questions by examining the event-related potentials (ERPs) recorded in a visual object-recognition task under the attentional blink paradigm, wherein task-irrelevant but natural sounds could be either presented synchronously with T2s or absent and could be either semantically congruent or incongruent with T2s when presented (e.g., barks with dogs, or barks with cars). We found that both the congruent and incongruent sounds improved T2 discrimination during the attentional blink interval, but the enhancement was significantly larger for the congruent sounds than incongruent sounds. These behavioral improvements induced by both the congruent and incongruent sounds were closely associated with an early cross-modal interaction (N195 component, 192–228 ms after T2 onset) over occipital scalp, indicating cross-modal neural activity at the relatively early stage of visual object-discrimination underlies the auditory benefit on attentional blink. In contrast, the lower T2 accuracy improvement for the incongruent than congruent condition was accompanied by a greater late occurring N440 effect (424–448 ms) over centro-parietal region, suggesting the additional semantic congruency effect on the cross-modal boost on attentional blink has a late processing locus. Materials and Methods Participants A total of 36 neurologically healthy subjects participated in the experiment after giving written informed consent as approved by the Human Research Protections Program of SooChow University. All experimental procedures were in agreement with the Declaration of Helsinki. All subjects reported normal or corrected-to-normal vision as well as normal audition. Data from two participants were excluded either because of quit during the experiment or due to excessive electroencephalogram (EEG) artifacts (>40%), leaving data of 34 subjects (22 female and 12 males; age range of 18–28 years, mean age of 20.6 years; all right-handed) for further analysis. Stimuli, Apparatus, and Task The visual stimuli consisted of 48 black-and-white line drawings, including 30 unique drawings of houses used as distractors, 9 unique drawings (3 clothes, 3 cups, and 3 flowers) used as the first target (T1), and the remaining 9 unique drawings (3 dogs, 3 cars, and 3 drums) used as the second target (T2; see Fig. 1B,C). The line drawings for T1 and T2 were from two nonoverlapping sets in order to avoid priming effect (Koelewijn et al. 2008) or repetition blindness effect (Kanwisher 1987). These line drawings were selected carefully through Internet searches and then standardized to a size of 5.6° × 4.5° in visual angle. The auditory stimuli were comprised of 9 unique natural sounds (3 barks of dogs, 3 beeps of cars, and 3 beats of drums; all stereo) that were also collected from the Internet (http://www.findsounds.com) and then standardized to 200 ms in duration (with 20 ms rise and fall ramps) and approximately 75 dB in loudness at subjects’ ears when delivered. The 200-ms duration for sounds was chosen because it could not only shorten their duration suitable for the rapid serial visual presentation (RSVP), but also leave them fully recognizable (Adam and Noppeney 2014). Figure 1 Open in new tabDownload slide (A) Schematic illustration of the RSVP stream for a trial on which T2 was presented 3 positions after T1 (i.e., at lag 3) and on which a semantically congruent sound was presented simultaneously with T2. Each line drawing in the RSVP stream was presented for 100 ms. On a given trial, T2 could be presented either at lag 3 or at lag 8 and could be one of the 9 drawings or be blank. A task-irrelevant natural sound could be delivered synchronously with the presented T2, alone (with a “blank T2”), or absent. When synchronous with the presented T2, the sound could be either semantically congruent or incongruent with T2 (e.g., a bark with a dog, or a bark with a car). The task for participants was to discriminate sequentially the exact identities of T1 and T2 in an unspeeded fashion, while ignoring all sounds (if delivered). Note that the fixation “+” was presented in red during the experiment. (B) Illustration of nine options for T1. (C) Ten options for T2, including the “blank T2”. Figure 1 Open in new tabDownload slide (A) Schematic illustration of the RSVP stream for a trial on which T2 was presented 3 positions after T1 (i.e., at lag 3) and on which a semantically congruent sound was presented simultaneously with T2. Each line drawing in the RSVP stream was presented for 100 ms. On a given trial, T2 could be presented either at lag 3 or at lag 8 and could be one of the 9 drawings or be blank. A task-irrelevant natural sound could be delivered synchronously with the presented T2, alone (with a “blank T2”), or absent. When synchronous with the presented T2, the sound could be either semantically congruent or incongruent with T2 (e.g., a bark with a dog, or a bark with a car). The task for participants was to discriminate sequentially the exact identities of T1 and T2 in an unspeeded fashion, while ignoring all sounds (if delivered). Note that the fixation “+” was presented in red during the experiment. (B) Illustration of nine options for T1. (C) Ten options for T2, including the “blank T2”. The experiment was performed in a dark and sound-attenuated chamber. Stimulus presentation was scripted using “Presentation” software (version 18.0, NeuroBehavioral Systems, Inc.). The visual stimuli were presented on a 27-inch LCD monitor (ASUS PG279Q, resolution of 1920 × 1080, refresh rate at 120 Hz) on which the background color was set to gray, and the auditory stimuli (if presented) were delivered by a pair of loudspeakers (HiVi X3) positioned at the left and right sides of the monitor. The horizontal and vertical distances between each of the loudspeakers and the center of the monitor were 21.5° and 0° in visual angle, respectively. Subjects sat in front of the monitor with a viewing distance of approximately 80 cm, and were required to maintain their eyes fixated on a red cross (0.3° × 0.3° of visual angle), which was displayed at the center of the screen throughout each trial. Each trial started with the presentation of fixation for a fixed period of 1000 ms, which was immediately followed by an RSVP stream presented at the center of the screen (Fig. 1A). The RSVP stream was comprised of 17 distinct line drawings, including two targets (T1 and T2) and 15 distractors (selected randomly [without repetition] from the 30 drawings of houses for each trial), and each line drawing was presented for 100 ms with a stimulus-onset asynchrony (SOA) of 100 ms. T1 could be one of the 9 drawings (3 clothes, 3 cups, and 3 flowers; see Fig. 1B) with equal probability, and was presented randomly and equiprobably from the third to fifth position in the RSVP stream. T2 was presented either 3 positions after T1 (i.e., at lag 3, T1-to-T2 SOA = 300 ms; 66.7% of all trials) or 8 positions after T1 (i.e., at lag 8, T1-to-T2 SOA = 800 ms; 33.3% of all trials). Note that we included more trials for lag 3 condition because (1) the present study focused particularly on the cross-modal boost on attentional blink; (2) lag 3 trials were where a strong attentional blink effect was expected (c.f., Adam and Noppeney 2014; Kranczioch and Thorne 2015; Maier and Rahman 2018); and (3) a large number of lag 3 trials allowed a performance-based analysis of ERP data in lag 3 condition (see the Data analysis for details). Meanwhile, T2 could be one of another 9 drawings with equal probability (3 dogs, 3 cars, and 3 drums; 60% of all trials) or be blank (i.e., a white rectangle with the same size as the line drawings; 40% of all trials; see Fig. 1C). The reasons for including the “blank T2” trials were detailed in the Data analysis (see below). Along with the RSVP stream, a task-irrelevant natural sound (i.e., one of the 9 unique sounds [3 barks of dogs, 3 beeps of cars, and 3 beats of drums] with equal probability) was delivered synchronously with T2 on 40% of the trials, which could be either semantically congruent with T2 (labeled as VAcon condition, 20% of the trials; e.g., bark with dog) or incongruent with T2 (labeled as VAincon, 20% of the trials; e.g., bark with car). The temporal onsets of the auditory and visual stimuli were tested to guarantee that they were presented synchronously. The sound was delivered alone (synchronously with the “blank T2”) on 20% of the trials (labeled as A-only). On another 20% of the trials, T2 was presented without sound (labeled as V-only). On the remaining 20% of the trials, neither sound nor visual stimulus was presented (i.e., only the “blank T2” was presented; labeled as No-stim). Therefore, the experimental design manipulated factors of lag (lag 3, lag 8) and T2 stimulus type (VAcon, VAincon, V-only, A-only, and No-stim). All these types of trials were presented in a randomized order. The task for participants was to discriminate sequentially, after each RSVP stream, the exact identities of both T1 and T2 as accurately as possible in an unspeeded fashion by pressing buttons on the keyboard’s number pad (1–9 for T1, 0–9 for T2; see Fig. 1B,C) with right hand, while ignoring all sounds (if delivered). Only if the exact identity of T1 or T2 was correctly discriminated would this discrimination be coded as a correct response. Of note, in order to prevent subjects from responding based on sounds, subjects were informed that the identity of the sound was not indicative of the exact identity of T2, even in the VA congruent condition (e.g., any single bark of dog was likely to be presented simultaneously with any single drawing of dog). In addition, they were instructed that the button “0” should be pressed only when they were sure that they did not see T2. The keypress for T2 then triggered the next trial. The whole experiment consisted of 27 blocks of 60 trials each, resulting 216 trials per lag 3 condition and 108 trials per lag 8 condition. Participants were allowed to have a rest between blocks in order to relieve fatigue. Electrophysiological Recording and Processing The EEG was recorded continuously using a NeuroScan SynAmps system (NeuroScan, Inc.) and was recorded from 57 tin electrodes mounted in an elastic cap (Electro-Cap International, Inc.). These electrode sites (FPz, FP1, FP2, Fz, F1, F2, F3, F4, F7, F8, FCz, FC1, FC2, FC3, FC4, FC5, FC6, Cz, C1, C2, C3, C4, C5, C6, T7, T8, CPz, CP1, CP2, CP3, CP4, CP5, CP6, TP7, TP8, Pz, P1, P2, P3, P4, P5, P6, P7, P8, POz, PO3, PO4, PO7, PO8, Oz, O1, O2, I3, I4, SI3, SI4, and the right mastoid) were positioned according to a modified 10–10 system montage (McDonald et al. 2003). All EEG electrodes were referenced to an electrode on the left mastoid during data acquisition. Horizontal eye movements were monitored via a bipolar pair of electrodes positioned at the left and right outer canthi (horizontal EOG). Vertical eye movements and blinks were monitored bipolarly by two electrodes above and below the left eye (vertical EOG). The impedances of all electrodes were kept below 5 kΩ. The online EEG and EOG signals were amplified with a gain of 10 000, a band-pass filter of 0.05–100 Hz, and were continuously digitized with a sampling rate of 1000 Hz. In offline processing, the continuous EEG signals were firstly down-sampled to 250 Hz, and then low-pass filtered digitally (30 Hz, 24 dB/octave) using a zero-phase-shift finite impulse response (FIR) filter to attenuate high-frequency noises triggered by muscle activities or external electrical sources. The filtered EEG data were rereferenced to the algebraic mean of the left and right mastoid electrodes. EEG signals were then divided into 600-ms epochs time-locked to the onset of T2 with a 100-ms prestimulus baseline and were baseline-corrected. In order to eliminate epochs contaminated by eye movements, eye blinks, and muscle activities, automatic artifact rejection was performed so that epochs with voltage value exceeding ±75 μV at any time point and at any EEG/EOG channel were treated as being contaminated by artifacts and were discarded. Besides, participants were excluded from further analysis if more than 40% of the epochs were discarded after artifact rejection (1 participant). Among the remaining 34 participants, 11.5 ± 1.4% (M ± SE) of the epochs were excluded after artifact rejection. In addition, according to the previous EEG studies on attentional blink (e.g., Vogel et al. 1998; Kranczioch et al. 2007; Haroush et al. 2011; Kranczioch and Thorne 2015; Maier and Rahman 2018), only trials (epochs) on which T1 identity was correctly discriminated were further analyzed, thus leaving on average 170 ± 3 (M ± SE) valid epochs per lag 3 condition and 82 ± 2 valid epochs per lag 8 condition. The remaining valid epochs in each condition were then averaged separately to obtain corresponding ERP waveforms. EEG processing and ERP analysis were carried out using “Scan” software (version 4.5, NeuroScan, Inc.). Data Analysis Because the focus of the current study was the neural basis of semantic modulation of the cross-modal boost on attentional blink that is stronger for lag 3 than lag 8 condition and much more trials were designed for lag 3 than lag 8 condition, only the analysis of ERP data in lag 3 condition was planned in the present study (c.f. Kranczioch and Thorne 2015). Nevertheless, after seeing the ERP results in lag 3 condition, an exploratory analysis of ERP data in lag 8 condition was also performed for purpose of comparison, which can be found in the Supplementary Material, Section B: exploratory analysis of ERP data in lag 8 condition. In order to investigate the neural activities underlying the effect of audiovisual semantic congruency, ERPs elicited by the bimodal VA congruent and VA incongruent stimuli were contrasted directly. Furthermore, to explore the neural basis unaffected by audiovisual semantic congruency but underlying the cross-modal boost on attentional blink per se, the two bimodal ERPs above were then contrasted separately with the summed ERPs elicited by the unimodal V and A stimuli (i.e., VAcon vs. [V + A], VAincon vs. [V + A]). The bimodal versus summed unimodal ERP contrast was performed because significant differences revealed in the contrast have been considered as neural activities associated with audiovisual cross-modal interactions (Giard and Peronnet 1999; Molholm et al. 2002, 2004; Teder-Sälejärvi et al. 2002, 2005; Talsma and Woldorff 2005; Bonath et al. 2007; Mishra et al. 2007, 2008, 2010; Talsma et al. 2007; Brandwein et al. 2011; Van der Burg et al. 2011; Yang et al. 2013; Gao et al. 2014; Kranczioch and Thorne 2015; Zhao et al. 2018, 2020). Prior to these comparisons, the time-locked ERPs elicited by the No-stim (i.e., blank T2 with no sound) trials were first subtracted from each of the original (VAcon, VAincon, V and A) ERP waveforms, in order to cancel out not only any common prestimulus anticipatory activity that might extend into poststimulus period and lead to false discovery of an early cross-modal interaction (Talsma and Woldorff 2005; Bonath et al. 2007; Mishra et al. 2007, 2008, 2010; Van der Burg et al. 2011; Zhao et al. 2018), but also the systematic superposition of ERPs elicited by the pre- and post-T2 distractors (Vogel et al. 1998; Sergent et al. 2005; Luo et al. 2010, 2013; Kranczioch and Thorne 2015; Maier and Rahman 2018). The original ERP waveforms without subtracting the time-locked ERPs elicited by the No-stim trials are shown for lag 3 condition in Supplementary Figure 1. Nevertheless, it should be noted that when conducting the bimodal versus summed unimodal ERP contrast under the present paradigm, subtracting ERPs elicited by the No-stim trials might not entirely cancel out the “auditory” oddball effect elicited by the salient onset of sounds in VAcon and VAincon conditions. This was because the transient blank T2 in RSVP stream of No-stim condition might evoke an additional “visual” oddball effect, and the transient blank T2 paired with a salient sound in A-only condition might induce an “audiovisual” oddball effect. Accordingly, these oddball effects would be completely balanced in the present bimodal versus summed unimodal ERP contrast (after subtracting ERPs in No-stim condition) only if (1) the audiovisual oddball effect was equivalent to the sum of auditory and visual oddball effects; and (2) there was no interplay between cross-modal interaction and auditory oddball effect. Unfortunately, the current paradigm did not allow a direct test for the two assumptions. However, since the bimodal versus summed unimodal contrast is a widely accepted approach to isolating neural basis of audiovisual interaction, and this contrast has been consistently utilized in previous multisensory ERP studies under paradigms quite similar to the present one (e.g., Van der Burg et al. 2011; Kranczioch and Thorne 2015), conducting this contrast would allow us to interpret at least partially the current ERP results based on previous studies. Importantly, the spatiotemporal dynamics of our ERP effects suggest that the core interpretations of our ERP results appear less likely to be altered by the potential problem of this contrast (for details, see Discussion). Therefore, we still decided to report the results from the bimodal minus summed unimodal contrast, but the respective interpretations in terms of cross-modal interaction were treated only as suggestive rather than conclusive. To identify reliable time windows and electrode sites for measuring ERP components while controlling the problem of multiple “implicit” comparisons that could inflate the Type I error rate (Luck 2014; Luck and Gaspelin 2017), the current study firstly conducted repeated-measure ANOVAs with a single factor of ERP type (VAcon vs. VAincon vs. [V + A]) on individual amplitude values at all time points (4 ms each) within 100–500 ms after T2 onset for all scalp electrodes (i.e., a mass univariate analysis; Groppe et al. 2011a, 2011b, of the main effect of ERP type), with the classic Benjamini and Hochberg (1995) algorithm for control of the false discovery rate (FDR) being applied to assess the significance of each ANOVA above using an FDR-corrected P value of 0.05. Prior to the FDR correction, all P-values for ANOVA results were firstly corrected using the Greenhouse–Geisser method to handle any possible violation of the sphericity assumption. Note that albeit the summed unimodal ERPs (V + A) were also included in the ANOVAs, they would not interfere with the discovery of ERP difference between VAcon and VAincon conditions, because a reliable difference between VAcon and VAincon conditions, if there were, would lead to a significant main effect of ERP type anyway. The mass univariate analysis of the ERP type main effect with FDR correction was carried out using a MATLAB function FfdrGND in the Factorial Mass Univariate Toolbox (Fields 2017; Fields and Kuperberg 2020), which builds upon and extends the existing Mass Univariate Toolbox (Groppe et al. 2011a, 2011b) from t-test-based to ANOVA-based analysis, in conjunction with the EEGLAB (Delorme and Makeig 2004) and ERPLAB (Lopez-Calderon and Luck 2014) open source MATLAB packages. We decided a priori not to include time points earlier than 100 ms post-T2 onset in the analysis because of the followings: (1) an influential study systematically investigating modality-based attention effect on early cross-modal interaction (Talsma et al. 2007) has demonstrated that cross-modal interaction earlier than 100 ms occurred when both the visual and auditory modalities were attended voluntarily, but not when only the visual modality was attended voluntarily as the present study; (2) the conclusion drawn by Talsma et al. (2007) is consistent with not only findings of significant cross-modal interactions earlier than 100 ms from a number of multisensory studies where both the visual and auditory modalities were attended voluntarily (or task-relevant; e.g., Teder-Sälejärvi et al. 2002; Molholm et al. 2002; Fort et al. 2002; Cappe et al. 2010; Raij et al. 2010; Senkowski et al. 2011), but also findings of no cross-modal interaction earlier than 100 ms from many other studies where only the visual modality was attended or task-relevant (e.g., Mishra et al. 2007, 2008, 2010; Wu et al. 2009; Yang et al. 2013; Gao et al. 2014; Kranczioch and Thorne 2015; Zhao et al. 2018, 2020; Kaya and Kafaligonul 2019; but see Van der Burg et al. 2011); (3) in the pioneering ERP study exploring the temporal dynamics of the cross-modal boost on attentional blink (Kranczioch and Thorne 2015), whose experimental design and task was more similar to the present study than other studies, the earliest cross-modal interaction did not emerge until 96 ms. Thus, cross-modal interaction earlier than 100 ms was not expected in the present study. After finding several prominent spatiotemporal regions formed by consecutive time points and adjacent electrodes over which the point-by-point mass univariate analyses of the ERP type main effect were all significant with FDR correction, within each of these spatiotemporal regions, mean amplitude during the time window (requiring at least 20 ms) at one electrode located in or most proximate to (if not in) the midline of the scalp was computed to measure the corresponding underlying ERP component, respectively. Multiple reasons for choosing the time window at a midline electrode or electrode most proximate to the midline within each prominent spatiotemporal region were detailed as follows. First, all stimuli were presented centrally in the present study, hence no lateralized ERP effect was expected and the midline electrode should be an impartial a priori measure of the location of an underlying ERP modulation in the current study. Second, this strategy would not further increase the Type I error rate, because each time point during the chosen time windows at their respective electrodes had survived the initial FDR-corrected mass univariate analysis anyway. Third, we did not include significant time points at all electrodes within a spatiotemporal region, because the mass univariate analysis (regardless of the specific approach) almost always generates significant points with asynchronous time courses (at least in part) among adjacent electrodes and there may be even gaps between significant points both in time and space within a spatiotemporal region (e.g., see Groppe et al. 2011a; Delong et al. 2014; Lee and Kang 2020), which is apparently inconvenient for our further analysis and even future studies to quantify the widely used mean amplitude over an unambiguous time window and electrodes of interest. Fourth, measuring mean amplitude at a single representative electrode is also a common practice in ERP/EEG literature (e.g., McDonald et al. 2013; Störmer et al. 2016; Pierce et al. 2018; Donohue et al. 2020; Joos et al. 2020; Volosin and Horváth 2020), which can provide not only readily quantified measurement parameters for our further comparisons without exceeding the data-driven spatiotemporal regions, but also unequivocal prior knowledge for follow-up studies. Last but not least, there is actually no fundamental difference between the current approach and those used in previous studies when determining measurement parameters for further comparisons based on the results from mass univariate analysis, as they also selected time windows and electrodes over which the data-driven spatiotemporal regions were mainly located (e.g., Delong et al. 2014; Kaya and Kafaligonul 2019; Akyuz et al. 2020; Lee and Kang 2020). The resulting mean amplitude of each ERP was firstly compared between VAcon and VAincon conditions by a one-way repeated-measure ANOVA, in order to reveal the neural substrate of the audiovisual semantic congruency effect. Furthermore, in order to test whether the magnitudes of these ERP components would further account for the discrimination accuracy of audiovisual T2s during the attentional blink (c.f., Kranczioch and Thorne 2015), a performance-based ERP analysis was further conducted for VAcon and VAincon trials by comparing trials on which T2 identity was correctly discriminated (correct trials) with trials on which T2 was incorrectly discriminated (incorrect trials). Thus, a two-way repeated-measure ANOVA with factors of congruency (VAcon vs. VAincon) and accuracy (correct vs. incorrect) was further performed on the mean amplitude of each ERP component. On average, there were 89 ± 5 (M ± SE) VAcon_correct trials, 78 ± 4 VAcon_incorrect trials, 82 ± 6 VAincon_correct trials and 87 ± 5 VAincon_incorrect trials left for obtaining respective ERP waveforms. Lastly, to explore the neural basis unaffected by audiovisual semantic congruency but underlying the cross-modal boost on attentional blink per se, the mean amplitude of each ERP component was additionally contrasted between VAcon versus (V + A) and VAincon versus (V + A) using separate one-way repeated-measure ANOVAs. Finally, for the purpose of comparison, an exploratory analysis of ERP data in lag 8 condition was also performed using the same measurement parameters and statistical analysis procedures as those for lag 3 condition, which can be found in the Supplementary Material, Section B: exploratory analysis of ERP data in lag 8 condition. Results Behavioral Results The mean accuracy of T1 discrimination across all conditions was 87.3 ± 1.2% (M ± SE, the same below). A 2-way ANOVA with factors of stimulus type (V, VAcon, VAincon) and lag (lag 3, lag 8) conducted on T1 accuracy did not reveal any significant main effect [stimulus type: F(2, 66) = 0.75, P = 0.479, η2P = 0.02; lag: F(1, 33) = 0.01, P = 0.905, η2P = 0.0004] or interaction effect [F(2, 66) = 0.64, P = 0.530, η2P = 0.02]. In contrast, the same 3 × 2 ANOVA on T2 accuracy (given correct discrimination of T1) showed a highly significant main effect of lag [F(1, 33) = 191.27, P < 0.0001, η2P = 0.85], with much lower T2 accuracy for lag 3 (48.6 ± 2.6%) than lag 8 (62.6 ± 2.5%), indicative of a reliable attentional blink effect (Raymond et al. 1992) in the present study (Fig. 2). Besides, the main effect of stimulus type [F(2, 66) = 15.70, P < 0.0001, η2P = 0.32] and, more importantly, the stimulus type × lag interaction [F(2, 66) = 9.19, P = 0.0003, η2P = 0.22] were also highly significant. Specific comparisons revealed that the main effect of stimulus type was in fact nonsignificant in lag 8 condition [F(2, 66) = 0.40, P = 0.669, η2P = 0.01; V: 62.3 ± 2.8%; VAcon: 63.2 ± 2.3%; VAincon: 62.4 ± 2.6%]. However, the main effect of stimulus type was highly significant in lag 3 condition [F(2, 66) = 30.84, P < 0.0001, η2P = 0.48], with enhanced T2 accuracy for both VAcon (52.5 ± 2.5%) and VAincon (47.8 ± 2.7%) conditions relative to V (45.6 ± 2.7%) condition (both Ps < 0.006, ds > 0.50), and higher T2 accuracy for VAcon than VAincon condition (P < 0.0001, d = 0.84) as well (see Fig. 2, left half). These results not only replicate the classic cross-modal boost on attentional blink (Olivers and Van der Burg 2008; Kranczioch and Thorne 2013, 2015) and the semantic congruency effect on it (Adam and Noppeney 2014), but also indicate, for the first time, that even a semantically incongruent sound that is task-irrelevant can bleep T2 out of the attentional blink. Figure 2 Open in new tabDownload slide Mean accuracy of T2 discrimination (given T1 correct) as functions of lag (lag 3, lag 8) and stimulus type (visual-only [V], congruent audiovisual [VAcon], incongruent audiovisual [VAincon]). Both the semantically congruent and incongruent sounds improved T2 discrimination at lag 3, but the enhancement was significantly larger for the congruent sounds. Error bars represent ±1 SE; **P < 0.01; ***P < 0.001. Figure 2 Open in new tabDownload slide Mean accuracy of T2 discrimination (given T1 correct) as functions of lag (lag 3, lag 8) and stimulus type (visual-only [V], congruent audiovisual [VAcon], incongruent audiovisual [VAincon]). Both the semantically congruent and incongruent sounds improved T2 discrimination at lag 3, but the enhancement was significantly larger for the congruent sounds. Error bars represent ±1 SE; **P < 0.01; ***P < 0.001. ERP Results The mass univariate analysis of main effect of ERP type (VAcon vs. VAincon vs. [V + A]) in lag 3 condition with FDR correction for multiple comparisons (Fig. 3) showed that the earliest significant main effect of ERP type occurred around 200 ms post-T2 over 2 occipital electrodes O1 (primarily) and O2. According to the quantification criterion (see the Data analysis for details), the corresponding ERP component was measured as mean amplitude during the exact significant time window of 192–228 ms at O1 electrode (see Fig. 3). Further comparison of the mean amplitudes of this negative-going deflection (labeled as N195 for convenience; Fig. 4A), however, showed that there was no significant difference between the congruent and incongruent bimodal ERP waveforms [VAcon vs. VAincon: F(1, 33) = 0.84, P = 0.366, η2P = 0.02; VAincon—VAcon = −0.10 ± 0.11 μV (M ± SE, the same below)]. A subsequently significant main effect of ERP type began from approximately 330 to 400 ms or more and was distributed mainly over fronto-central electrodes, including the midline electrode Fz (Fig. 3). Thus, the underlying ERP component was quantified as mean amplitude during the exact significant time window of 352–396 ms at Fz electrode. Further comparison of the mean amplitudes of this positive-going ERP (labeled as P360; Fig. 4B) did not reveal any significant difference between the congruent and incongruent bimodal ERP waveforms, either [VAcon vs. VAincon: F(1, 33) = 0.09, P = 0.772, η2P = 0.003; VAincon–VAcon = −0.08 ± 0.29 μV]. Figure 3 Open in new tabDownload slide Spatiotemporal distribution of significant (FDR-corrected P < 0.05) time points (4 ms each, indexed by white squares) obtained via the point-by-point mass univariate analysis of the main effect of ERP type (VAcon vs. VAincon vs. [V + A]) on ERP data in lag 3 condition from 100 to 500 ms post-T2 onset (abscissa) across all electrode sites (ordinate), with the 0.05-level FDR correction (Benjamini and Hochberg 1995) for multiple comparisons. Left scalp electrodes are depicted uppermost, midline scalp electrodes in the center, and right scalp electrodes in the lower portions. For the same mass univariate analysis but for data with a sampling rate of 500 Hz, see Supplementary Figure 2. Figure 3 Open in new tabDownload slide Spatiotemporal distribution of significant (FDR-corrected P < 0.05) time points (4 ms each, indexed by white squares) obtained via the point-by-point mass univariate analysis of the main effect of ERP type (VAcon vs. VAincon vs. [V + A]) on ERP data in lag 3 condition from 100 to 500 ms post-T2 onset (abscissa) across all electrode sites (ordinate), with the 0.05-level FDR correction (Benjamini and Hochberg 1995) for multiple comparisons. Left scalp electrodes are depicted uppermost, midline scalp electrodes in the center, and right scalp electrodes in the lower portions. For the same mass univariate analysis but for data with a sampling rate of 500 Hz, see Supplementary Figure 2. Figure 4 Open in new tabDownload slide Grand-averaged ERP waveforms in lag 3 condition elicited by T2s paired with semantically congruent sounds (VAcon) and T2s paired with incongruent sounds (VAincon), and the sum of unisensory ERP waveforms elicited by T2s alone and sounds alone (V + A), which are shown from O1 electrode for N195 component (A), from Fz electrode for P360 component (B) and from P1 electrode for N440 component (C). The shaded areas on ERP waveforms depict the time windows (192–228, 352–396, and 424–448 ms) within which the mean amplitudes of these ERP difference components were quantified for further comparisons, respectively. Scalp topographies are shown separately for the (VAincon–VAcon), (VAcon–[V + A]), and (VAincon–[V + A]) difference amplitudes within each of the shaded time intervals on waveforms. The white dot on each scalp topography depicts the corresponding electrode at which the mean amplitude of each ERP component was quantified. Note that each time point during the three time windows at their respective electrode had survived the initial mass univariate analysis of the main effect of ERP type with FDR correction (see Fig. 3). Paired observation graphs are also shown for mean amplitudes of these ERP difference components, respectively, with the data from individual subjects being displayed by gray and the group-averaged data being marked by black symbols. Error bars correspond to ±1 SE; **P < 0.01; ***P < 0.001; n.s.: nonsignificant. Figure 4 Open in new tabDownload slide Grand-averaged ERP waveforms in lag 3 condition elicited by T2s paired with semantically congruent sounds (VAcon) and T2s paired with incongruent sounds (VAincon), and the sum of unisensory ERP waveforms elicited by T2s alone and sounds alone (V + A), which are shown from O1 electrode for N195 component (A), from Fz electrode for P360 component (B) and from P1 electrode for N440 component (C). The shaded areas on ERP waveforms depict the time windows (192–228, 352–396, and 424–448 ms) within which the mean amplitudes of these ERP difference components were quantified for further comparisons, respectively. Scalp topographies are shown separately for the (VAincon–VAcon), (VAcon–[V + A]), and (VAincon–[V + A]) difference amplitudes within each of the shaded time intervals on waveforms. The white dot on each scalp topography depicts the corresponding electrode at which the mean amplitude of each ERP component was quantified. Note that each time point during the three time windows at their respective electrode had survived the initial mass univariate analysis of the main effect of ERP type with FDR correction (see Fig. 3). Paired observation graphs are also shown for mean amplitudes of these ERP difference components, respectively, with the data from individual subjects being displayed by gray and the group-averaged data being marked by black symbols. Error bars correspond to ±1 SE; **P < 0.01; ***P < 0.001; n.s.: nonsignificant. After 400 ms post-T2 onset, there was the last significant main effect of ERP type centering around 440 ms and distributed over left parietal electrodes (Fig. 3). Since the electrode P1 was most proximate to the midline of the scalp among these parietal electrodes, the corresponding ERP was quantified as mean amplitude within the precise significant time window of 424–448 ms at P1 electrode. Further comparison of the mean amplitudes (Fig. 4C) found a significantly larger negative-going amplitude in response to the incongruent than congruent audiovisual T2s [VAincon vs. VAcon: F(1, 33) = 15.73, P = 0.0004, η2P = 0.32; VAincon—VAcon = −0.73 ± 0.18 μV]. This finding demonstrates that the behavioral effect of semantic congruency on the cross-modal boost of attentional blink (indexed by higher T2 accuracy for VA congruent than incongruent stimuli in lag 3 condition) might be associated with the late occurring negative ERP effect (labeled as N440 for convenience). For the single-subject version of ERP waveforms in Fig. 4, see Supplementary Figure 3. In order to test whether the magnitudes of the ERP components above would further account for the discrimination accuracy of audiovisual T2s during the attentional blink, a performance-based ERP analysis was conducted for VAcon and VAincon trials in lag 3 condition by comparing trials on which T2 identity was correctly discriminated (correct trials) with trials on which T2 was incorrectly discriminated (incorrect trials). The congruency (VAcon, VAincon) × accuracy (correct, incorrect) ANOVA performed on the occipital N195 component (qualified as mean amplitude during 192–228 ms at O1 electrode; Fig. 5A) revealed a significant main effect of accuracy [F(1, 33) = 10.68, P = 0.003, η2P = 0.24], with N195 amplitude being larger on correct than incorrect trials (correct–incorrect = −0.58 ± 0.18 μV [M ± SE, the same below]). However, the main effect of congruency [F(1, 33) = 0.81, P = 0.374, η2P = 0.02; VAincon–VAcon = −0.11 ± 0.13 μV] and the congruency × accuracy interaction [F(1, 33) = 0.28, P = 0.603, η2P = 0.01] were not significant. These results indicate that trial-to-trial fluctuation of the N195 amplitude underlies the discrimination accuracy of audiovisual T2s, which is independent of the audiovisual semantic congruency. After the occipital N195 component, however, the 2 × 2 ANOVA on the fronto-central P360 component (qualified as mean amplitude during 352–396 ms at Fz electrode) did not reveal any significant main effect [accuracy: F(1, 33) = 1.10, P = 0.301, η2P = 0.03; congruency: F(1, 33) = 0.0003, P = 0.986, η2P < 0.0001] or interaction [F(1, 33) = 0.52, P = 0.477, η2P = 0.02], suggesting the P360 amplitude is not related to the discrimination accuracy of audiovisual T2s. Figure 5 Open in new tabDownload slide Performance-based analysis of the N195 component (A) and the N440 component (B) in lag 3 condition, as functions of audiovisual congruency (VAcon, VAincon) and T2 accuracy (correct, incorrect). ERP waveforms shown here are from O1 electrode for N195 (192–228 ms) and from P1 electrode for N440 (424–448 ms) as Figure 4. Topographical voltage distributions are shown for the correct-minus-incorrect difference amplitude within the N195 interval (A) and for the VAincon-minus-VAcon difference amplitude within the N440 interval (B). A larger N195 amplitude was found for T2 correct than T2 incorrect trials irrespective of audiovisual congruency, whereas a greater N440 amplitude was found for the incongruent than congruent audiovisual T2s only on incorrect trials. Figure 5 Open in new tabDownload slide Performance-based analysis of the N195 component (A) and the N440 component (B) in lag 3 condition, as functions of audiovisual congruency (VAcon, VAincon) and T2 accuracy (correct, incorrect). ERP waveforms shown here are from O1 electrode for N195 (192–228 ms) and from P1 electrode for N440 (424–448 ms) as Figure 4. Topographical voltage distributions are shown for the correct-minus-incorrect difference amplitude within the N195 interval (A) and for the VAincon-minus-VAcon difference amplitude within the N440 interval (B). A larger N195 amplitude was found for T2 correct than T2 incorrect trials irrespective of audiovisual congruency, whereas a greater N440 amplitude was found for the incongruent than congruent audiovisual T2s only on incorrect trials. For the late occurring N440 component (qualified as mean amplitude during 424–448 ms at P1 electrode; Fig. 5B), both the main effects of accuracy [F(1, 33) = 17.10, P = 0.0002, η2P = 0.34] and congruency [F(1, 33) = 14.46, P = 0.001, η2P = 0.31] were significant. More importantly, there was a significant congruency × accuracy interaction effect on N440 amplitude [F(1, 33) = 4.23, P = 0.048, η2P = 0.11]. Further analysis of this two-way interaction revealed that the N440 semantic congruency effect (i.e., larger N440 amplitude for the incongruent than congruent audiovisual T2s) observed in the initial analysis above was actually significant not only on incorrect trials [F(1, 33) = 19.84, P < 0.0001, η2P = 0.38; VAincon–VAcon = −1.12 ± 0.25 μV] but not on correct trials [F(1, 33) = 0.86, P = 0.361, η2P = 0.03; VAincon—VAcon = −0.28 ± 0.30 μV]. These results provide further evidence that the late occurring N440 component is associated with the incorrect discrimination triggered by the semantically incongruent audiovisual T2s during the attentional blink interval. To further explore the potential neural basis unaffected by audiovisual semantic congruency but underlying the cross-modal boost on attentional blink per se, the mean amplitude of each ERP component was additionally contrasted between VAcon versus (V + A) and VAincon versus (V + A) using separate one-way repeated-measure ANOVAs. The results yielded that the N195 amplitudes in both the congruent and incongruent bimodal ERP waveforms were significantly larger than the summed unimodal waveform [VAcon vs. (V + A): F(1, 33) = 12.83, P = 0.001, η2P = 0.28, VAcon—(V + A) = −0.63 ± 0.17 μV; VAincon vs. (V + A): F(1, 33) = 16.74, P = 0.0003, η2P = 0.34, VAincon–(V + A) = −0.73 ± 0.18 μV; Fig. 4A]. These findings suggest a possible early cross-modal interaction during the occipital N195 interval, which might underlie the pattern of behavioral results that both the semantically congruent and incongruent sounds improved T2 discrimination at lag 3. Similarly, the P360 amplitudes in both the congruent and incongruent bimodal ERP waveforms were found to be significantly greater than the summed unimodal waveform [VAcon vs. (V + A): F(1, 33) = 12.52, P = 0.001, η2P = 0.28, VAcon–(V + A) = 1.28 ± 0.36 μV; VAincon vs. (V + A): F(1, 33) = 11.84, P = 0.002, η2P = 0.26, VAincon–(V + A) = 1.20 ± 0.35 μV; Fig. 4B]. However, given its independence of discrimination accuracy of audiovisual T2s at lag 3, the possible cross-modal interaction during the fronto-central P360 appears less likely to contribute to the cross-modal boost on attentional blink. Lastly, for the purpose of comparison only, the late N440 amplitude was found to be significantly larger in the bimodal than summed unimodal ERP waveforms solely for the incongruent audiovisual T2s [VAincon vs. (V + A): F(1, 33) = 11.78, P = 0.002, η2P = 0.26, VAincon–(V + A) = −1.11 ± 0.32 μV] but not for the congruent audiovisual T2s [VAcon vs. (V + A): F(1, 33) = 1.46, P = 0.235, η2P = 0.04, VAcon–(V + A) = −0.38 ± 0.32 μV; Fig. 4C]. Discussion Previous studies have shown that the visual attentional blink could be substantially reduced by presenting a task-irrelevant sound synchronously with T2 (Olivers and Van der Burg 2008; Kranczioch and Thorne 2013), and this auditory benefit on attentional blink could be modulated by semantic congruency between T2 and the simultaneous sound (Adam and Noppeney 2014). The present ERP study explored the neural basis underlying this cross-modal boost on attentional blink and the effect of audiovisual congruency on it in a visual object-recognition task under the attentional blink paradigm, wherein task-irrelevant but natural sounds could be either presented synchronously with T2s or absent and could be either semantically congruent or incongruent with T2s (e.g., barks with dogs, or barks with cars). The behavioral results showed that both the congruent and incongruent sounds, relative to no sound (visual-only T2) condition, improved T2 discrimination at lag 3, but the enhancement was significantly larger for the congruent sounds. These findings not only replicate the classic cross-modal boost on attentional blink (Olivers and Van der Burg 2008; Kranczioch and Thorne 2013) and the semantic congruency effect on it (Adam and Noppeney 2014), but also demonstrate, for the first time, that even a semantically incongruent sound can also bleep T2 out of the attentional blink. The ERP results firstly showed that the occipital N195 component did not differ between the congruent and incongruent audiovisual T2s in lag 3 condition. The performance-based analysis revealed that the N195 amplitude was actually enhanced on correct relative to incorrect trials for both the congruent and incongruent audiovisual T2 objects at lag 3. The additional bimodal versus summed unimodal ERP contrasts suggest a possible early cross-modal interaction during the occipital N195 interval. Collectively, these findings indicate that the occipitally distributed N195 component might be the proximate trigger for the occurrence of the cross-modal boost on the attentional blink, which were in close agreement with the pattern of present behavioral results that both the congruent and incongruent sounds improved T2 discrimination at lag 3. Furthermore, as the timing of the audiovisual N195 was similar to that of the visual N1 component elicited by the unimodal visual T2s (see Supplementary Fig. 1B, left, red trace), it is possible that the current bimodal-minus-unimodal difference during the N195 time interval reflects a sound-induced, superadditively enhancement of the visual N1 component (It is noteworthy that the timing of the current visual N1 component elicited by the unimodal visual T2s was relatively later than the typical visual N1 latency in literature. However, the latency of visual N1 has been shown to delay during more demanding visual task (Fort et al. 2005). Given that the overall T2 discrimination accuracy was rather low (i.e., task difficulty was high) in the present study (indexed especially by only ~62% T2 accuracy in lag 8 condition where the visual processing should be outside the attentional blink interval), we speculated that the delay of the present visual N1 latency could be attributed to the high difficulty of the current RSVP task.). In addition, it appears that the visual-evoked N1 in response to the unimodal visual T2s (Supplementary Fig. 1C, left) had a somewhat more lateral (rightward) and posterior distribution than the audiovisual N195 (Fig. 4A). Since the visual N1 at posterior electrodes is known to have a lateral occipital subcomponent and a medial parietal subcomponent as well (Mangun 1995), the present audiovisual N195 effect might be a combination of sound-induced modulations of both the medial parietal and lateral occipital visual N1 subcomponents, while with the parietal N1 enhancement being relatively dominant. In any case, given that the visual N1 component has been thought to play a substantial role in early visual discrimination processing (Tanaka et al. 1999; Vogel and Luck 2000; Hopf et al. 2002; Fort et al. 2005), the occipital audiovisual N195 found in the current study may indicate that the synchronous sound bleeped T2 out of the attentional blink by cross-modally strengthening the early visual discrimination processing for T2. It is noteworthy that inconsistent with the behavioral finding of higher T2 accuracy at lag 3 elicited by the congruent than incongruent sounds, there was no substantial difference on the audiovisual N195 amplitude between the congruent and incongruent audiovisual T2s in lag 3 condition, and the accuracy-contingent N195 amplitude was observed irrespective of the semantic congruency of audiovisual T2 objects. The disparity demonstrates that the effect of semantic congruency is not unfolded at relatively early stage of visual discrimination processing, which is in accordance with the findings in similar prior studies where only the visual modality was attended (e.g., Yuval-Greenberg and Deouell 2007; Sinke et al. 2014). After the initial 200 ms, the later fronto-central P360 component did not differ between the congruent and incongruent audiovisual T2s at lag 3, either. The further performance-based analysis also showed that there was no P360 amplitude difference between correct and incorrect trials for either the congruent or incongruent audiovisual T2 objects at lag 3, although the additional bimodal versus summed unimodal ERP contrasts implies a possible cross-modal interaction occurring within the P360 interval. A similar cross-modal positive difference component over fronto-central scalp within approximately 300–400 ms has been reported in a previous cross-modal study that also analyzed ERPs 300 ms after stimuli onset (Talsma and Woldorff 2005). Provided its late timing and anteriorly maximal topography, this component was hypothesized to reflect integrative processes that originate from high-level association brain areas (Talsma and Woldorff 2005). If that is the case, the present findings suggest that the fronto-central P360 component might reflect general aspects of multisensory integration elicited by audiovisual stimuli, which seems unaffected by audiovisual semantic congruency (at least during the attentional blink interval) and not associated with the improvements of T2 discrimination accuracy induced by the sounds. The effect of semantic congruency on the cross-modal boost on attentional blink was not revealed in ERP waveforms until approximately 400 ms after T2 onset, which was manifested as larger cento-parietal N440 component for the incongruent than congruent audiovisual T2s at lag 3. The performance-based analysis further revealed that the N440 semantic congruency effect (i.e., larger N440 amplitude for the incongruent than congruent audiovisual T2s) was actually evident only on incorrect trials but not on correct trials, indicating that the N440 component is associated with the incorrect discrimination triggered by the semantically incongruent audiovisual T2s. A previous ERP study (Molholm et al. 2004) using simultaneously presented, meaningful visual and auditory stimuli has also observed an audiovisual semantic congruency ERP effect analogous (if not identical) to the present N440 effect, which was interpreted as belonging under the class of N400 component that is sensitive to semantic mismatch (Molholm et al. 2004). Therefore, the present N440 component might be also a manifestation of the classic N400 component. If that is the case, because a prominent hypothesis has been that the N400 effect reflects higher order semantic analysis after the meanings of stimuli are, at least in part, identified (Kutas and Hillyard 1980; Vogel et al. 1998; Kutas and Federmeier 2000). The present N440 results echo the pattern of occipital N195 results and indicate that, unlike the relatively early processing locus for the cross-modal boost on attentional blink per se, the additional audiovisual semantic congruency effect has a late processing locus, which might be at higher-order stage of semantic analysis. It is worth mentioning that the present N440 measurement window (424–448 ms) was narrower than the time window used for quantifying the N400-like effect (400–500 ms) in the study of Molholm et al. (2004). The reason for the disparity is most likely to be the distinct approaches to identifying measurement windows. Specifically, the current N440 window was determined on the basis of the results from the point-by-point mass univariate analysis with FDR correction for multiple implicit comparisons. Thus, it is possible that some time points at both ends of the N440 period that were significant before FDR correction became no more significant after FDR correction, thereby narrowing the final measurement window. In contrast, measurement window of the N400-like effect reported by Molholm et al. (2004) was determined by eyeball inspection of the observed difference between their VAincon and VAcon conditions without correction for multiple implicit comparisons, which might broaden their measurement window. Indeed, this interpretation is supported by the present observation that ERP waveforms elicited by VAcon and VAincon stimuli began to diverge actually before 400 ms and extended until 500 ms (see Fig. 4C, blue and green traces), which fits well with the time course of the N400-like effect reported by Molholm et al. (2004). In addition, the scalp topography of the current N440 seemed more posteriorly distributed than that of the N400-like effect observed by Molholm et al. (2004), which might be attributed, at least in part, to the coexistence of the sustained positive P360 amplitude over fronto-central electrodes during 400–500 ms when displaying the current negative N440 effect by the bimodal-minus-unimodal difference amplitude (see Fig. 4C). When the N440 topography was plotted using the VAincon-minus-VAcon difference amplitude (see Fig. 5B) as employed by Molholm et al. (2004), it indeed became more anteriorly distributed to some extent. Nevertheless, after further inspection of ERP waveforms at P1 electrode (Fig. 4C), an alternative argument is that the present N440 effect might be not a manifestation of the classic N400 effect, but instead might merely reflect a decreased P3b amplitude in response to the incongruent relative to congruent audiovisual T2s. If this is the case, based on the well-known finding that the attentional blink is closely associated with the suppression of P3b amplitude (Vogel et al. 1998; Sergent et al. 2005; Kranczioch et al. 2007), we should have also predicted that the P3b amplitude in the summed unimodal ERP waveforms at lag 3 were the smallest (VAcon vs. VAincon vs. [V + A]) given that the present T2 accuracy in V condition at lag 3 was the lowest (i.e., the attentional blink in V condition was the largest [VAcon vs. VAincon vs. V]; see Fig. 2). In contrast, the current ERP data are clearly inconsistent with the prediction above in that the positive amplitude in the summed unimodal ERP waveforms was actually not the smallest (see Fig. 4C). Thus, the present N440 effect seems less likely to reflect a decreased P3b amplitude in response to the incongruent relative to congruent audiovisual T2s. Even if the N440 effect reflected a decreased P3b amplitude, the present conclusion that the semantic congruency effect on the cross-modal boost on attentional blink has a late processing locus would not be challenged, because the suppressed P3b during the attentional blink has been interpreted as an impairment also at the late stage of processing (e.g., Vogel et al. 1998; Sergent et al. 2005; Kranczioch et al. 2007). However, further research might be needed to distinguish the specific roles the N400 and P3b components played in the semantic congruency effect on the cross-modal boost on attentional blink. As noted in the data analysis, the validity of the bimodal versus summed unimodal contrast when isolating ERP correlates of cross-modal interactions would be held only if (1) the audiovisual oddball effect was equivalent to the sum of auditory and visual oddball effects; and (2) there was no interplay between cross-modal interaction and auditory oddball effect. Since the current experimental paradigm did not allow a direct test for the two assumptions, the possibility that there might be mixture of cross-modal interaction and oddball effects could not be ruled out. However, we thought this issue seems less likely to substantially alter the main interpretations of the current findings for several reasons. First, albeit it has been reported that the audiovisual oddball effect was not equal to the sum of auditory and visual oddball effects, the corresponding ERP difference was found to occur during about 180–220 ms postoddball onset over left parieto-temporal electrodes (Besle et al. 2005). Accordingly, if there were any unbalanced oddball effect left in the present bimodal versus summed unimodal contrast, it should have been evident in our data-driven mass univariate analysis (Fig. 3), showing a left parieto-temporal distribution around 200 ms post-T2 onset, but was actually not. Second, although the unbalanced oddball effect reported previously (Besle et al. 2005) has similar timing to the present N195 effect revealed in the bimodal versus summed unimodal contrast, its left parieto-temporal distribution is distinct from the occipital distribution of the present N195 (Fig. 4A), hence the present N195 effect seems more possible to originate from cross-modal interaction than unbalanced oddball effect. Indeed, an occipital cross-modal negative difference similar to the current N195 has been consistently reported in previous studies conducting the bimodal versus summed unimodal contrast under classic audiovisual paradigms (e.g., Brandwein et al. 2011, 2013; Molholm et al. 2020). Therefore, the present interpretation that early cross-modal interaction over visual area underlies the cross-modal boost on attentional blink appears less likely to be challenged. Third, since the neural basis underlying the interplay between cross-modal interaction and auditory oddball effect has been reported to occur within approximately 300–400 ms in posterior/superior temporal gyrus (Tse et al. 2015), and the scalp topography of ERPs generating from posterior/superior temporal cortex is typically fronto-centrally distributed (Mishra et al. 2007, 2008, 2010), it is possible that present fronto-central P360 effect revealed in the bimodal versus summed unimodal contrast might not be a pure cross-modal interaction. However, note that the present P360 amplitude was actually not associated with the discrimination accuracy of audiovisual T2s at lag 3, hence even if there were the interplay between cross-modal interaction and auditory oddball effect, it would not alter our current interpretations of the cross-modal boost on attentional blink. Fourth, although we conducted the bimodal versus summed unimodal contrast also for the present N440, the more direct VAincon versus VAcon contrast was actually the core basis for interpreting the present N440 results. Thus, the present interpretation that the semantic congruency effect on cross-modal boost of attentional blink is associated with the N440 will not be impaired. Having said that, a substantial refinement of the present paradigm is still needed in the future in order to distinguish purer neural correlates of cross-modal interaction from those of oddball effect. Based on the present behavioral and electrophysiological findings, we propose “a hierarchical model” for the cross-modal boost on attentional blink: a task-irrelevant but simultaneous sound, irrespective of its semantic relevance, firstly enables T2 to escape the attentional blink via cross-modally strengthening the relatively early stage of visual object-recognition processing, whereas the semantic conflict of the sound begins to interfere with visual awareness only at a later stage when the representation of visual object is identified and extracted. These findings add to the existing studies about how multisensory interaction benefits the allocation of attention (Van Vleet and Robertson 2006; Iordanescu et al. 2008; Olivers and Van der Burg 2008; Van der Burg et al. 2008, 2011; Kranczioch and Thorne 2013, 2015; Adam and Noppeney 2014) and extend our understanding of the mechanisms of attentional blink phenomenon per se. As illustrated in the Introduction, the attentional blink has been hypothesized to reflect an inhibition mainly in the late, postperceptual stage of processing (for review, see Dux and Marois 2009). However, the current ERP data show that the enhancement of early visual object-recognition processing induced by a task-irrelevant sound can also relieve the attentional blink. This asymmetry suggests that the strengthened visual representation of T2 seems to become more durable, which results in a higher probability that T2 will be further processed at the postperceptual stage (i.e., working memory) for conscious reports (for similar proposals, see Chun and Potter 1995; Shapiro et al. 1997; Olivers and Van der Burg 2008; Adam and Noppeney 2014). Nevertheless, a largely overlooked issue in the attentional blink literature (including the present study) is that the predictabilities of T2 at short and long lags (relative to T1) seem different in a sense. Take the current study for instance, as T2 was presented at either lag 3 or lag 8, T2 might not be present at lag 3; However, if T2 was not presented at lag 3, it must be presented at lag 8. Therefore, when subjects were encountering a lag 8 trial, their transient awareness of a distractor presented at lag 3 position might lead them to better prepare for (i.e., attend to) the upcoming T2 that would be certainly presented at lag 8, which could also account for the higher T2 accuracy at lag 8 than lag 3 (Fig. 2). Although this issue might not influence our interpretation of the cross-modal boost on T2 accuracy at lag 3, future research with paradigm refinement is definitely required to control this predictability effect more properly. Maybe the inclusion of a new condition wherein no T2 is presented and all stimuli after T1 are distractors would help to reduce the difference in predictability between T2 at short and long lags. Conclusion In summary, the current study aimed at investigating the electrophysiological time course of the cross-modal boost on attentional blink and when would this auditory benefit be modulated by audiovisual semantic congruency. The behavioral results not only replicated the classic cross-modal boost on attentional blink and the semantic congruency effect on it, but also indicated for the first time that even a semantically incongruent sound could also bleep T2 out of the attentional blink. The ERP results revealed that early cross-modal interaction manifested by the occipital N195 component might be the proximate trigger for the occurrence of the auditory benefit on attentional blink. In contrast, the additional semantic congruency effect was only accompanied by the late occurring N440 effect. These findings suggested a hierarchical model for the cross-modal boost on attentional blink: an early stage that is independent of cross-modal congruency and a later stage that is modulated by cross-modal congruency. Notes We thank Eric Fields for sending us the latest version of Factorial Mass Univariate Toolbox (v0.5.0) before its public release, with an excellent new feature of correction for nonsphericity prior to FDR correction. Conflict of Interest: The authors declare no potential conflict of interest. Funding National Natural Science Foundation of China (31771200 to W.F.F.); the Strategic Priority Research Program of Chinese Academy of Science (XDB32040200 to Y.J.W.). Author Contributions S.Z., Y.W., and W.F. designed the research; S.Z. and X.H. performed the research; S.Z., C.F., and W.F. analyzed the data; S.Z., Y.W., and W.F. wrote the paper. References Adam R , Noppeney U. 2014 . A phonologically congruent sound boosts a visual target into perceptual awareness . Front Integr Neurosci . 8 : 70 . Google Scholar Crossref Search ADS PubMed WorldCat Akyuz S , Pavan A, Kaya U, Kafaligonul H. 2020 . Short- and long-term forms of neural adaptation: an ERP investigation of dynamic motion aftereffects . Cortex . 125 : 122 – 134 . Google Scholar Crossref Search ADS PubMed WorldCat Benjamini Y , Hochberg Y. 1995 . Controlling the false discovery rate: a practical and powerful approach to multiple testing . J Roy Stat Soc B Stat Meth . 57 : 289 – 300 . Google Scholar OpenURL Placeholder Text WorldCat Besle J , Fort A, Giard MH. 2005 . Is the auditory sensory memory sensitive to visual information? Exp Brain Res . 166 : 337 – 344 . Google Scholar Crossref Search ADS PubMed WorldCat Bonath B , Noesselt T, Martinez A, Mishra J, Schwiecker K, Heinze HJ, Hillyard SA. 2007 . Neural basis of the ventriloquist illusion . Curr Biol . 17 : 1697 – 1703 . Google Scholar Crossref Search ADS PubMed WorldCat Brandwein AB , Foxe JJ, Russo NN, Altschuler TS, Gomes H, Molholm S. 2011 . The development of audiovisual multisensory integration across childhood and early adolescence: a high-density electrical mapping study . Cereb Cortex . 21 : 1042 – 1055 . Google Scholar Crossref Search ADS PubMed WorldCat Brandwein AB , Foxe JJ, Butler JS, Russo NN, Altschuler TS, Gomes H, Molholm S. 2013 . The development of multisensory integration in high-functioning autism: high-density electrical mapping and psychophysical measures reveal impairments in the processing of audiovisual inputs . Cereb Cortex . 23 : 1329 – 1341 . Google Scholar Crossref Search ADS PubMed WorldCat Busse L , Roberts KC, Crist RE, Weissman DH, Woldorff MG. 2005 . The spread of attention across modalities and space in a multisensory object . Proc Natl Acad Sci USA . 102 : 18751 – 18756 . Google Scholar Crossref Search ADS PubMed WorldCat Cappe C , Thut G, Romei V, Murray MM. 2010 . Auditory-visual multisensory interactions in humans: timing, topography, directionality, and sources . J Neurosci . 30 : 12572 – 12580 . Google Scholar Crossref Search ADS PubMed WorldCat Chen YC , Spence C. 2010 . When hearing the bark helps to identify the dog: semantically-congruent sounds modulate the identification of masked pictures . Cognition . 114 : 389 – 404 . Google Scholar Crossref Search ADS PubMed WorldCat Chen YC , Spence C. 2013 . The time-course of the cross-modal semantic modulation of visual picture processing by naturalistic sounds and spoken words . Multisens Res . 26 : 371 – 386 . Google Scholar Crossref Search ADS PubMed WorldCat Chen YC , Spence C. 2018 . Audiovisual semantic interactions between linguistic and nonlinguistic stimuli: the time-courses and categorical specificity . J Exp Psychol Hum Percept Perform . 44 : 1488 – 1507 . Google Scholar Crossref Search ADS PubMed WorldCat Chun MM , Potter MC. 1995 . A two-stage model for multiple target detection in RSVP . J Exp Psychol Hum Percept Perform . 21 : 109 – 127 . Google Scholar Crossref Search ADS PubMed WorldCat Delong KA , Quante L, Kutas M. 2014 . Predictability, plausibility, and two late ERP positivities during written sentence comprehension . Neuropsychologia . 61 : 150 – 162 . Google Scholar Crossref Search ADS PubMed WorldCat Delorme A , Makeig S. 2004 . EEGLAB: an open source toolbox for analysis of single-trial EEG dynamics including independent component analysis . J Neurosci Methods . 134 : 9 – 21 . Google Scholar Crossref Search ADS PubMed WorldCat De Meo R , Murray MM, Clarke S, Matusz PJ. 2015 . Top-down control and early multisensory processes: chicken v.s. egg . Front Integr Neurosci . 9 : 17 . Google Scholar Crossref Search ADS PubMed WorldCat Di Lollo V , Kawahara J, Ghorashi SMS, Enns JT. 2005 . The attentional blink: resource depletion or temporary loss of control? Psychol Res . 69 : 191 – 200 . Google Scholar Crossref Search ADS PubMed WorldCat Donohue SE , Harris JA, Loewe K, Hopf JM, Heinze HJ, Woldorff MG, Schoenfeld MA. 2020 . Electroencephalography reveals a selective disruption of cognitive control processes in craving cigarette smokers . Eur J Neurosci . 51 : 1087 – 1105 . Google Scholar Crossref Search ADS PubMed WorldCat Dux PE , Marois R. 2009 . The attentional blink: a review of data and theory . Atten Percept Psychophys . 71 : 1683 – 1700 . Google Scholar Crossref Search ADS PubMed WorldCat Fiebelkorn IC , Foxe JJ, Molholm S. 2010 . Dual mechanisms for the cross-sensory spread of attention: how much do learned associations matter? Cereb Cortex . 20 : 109 – 120 . Google Scholar Crossref Search ADS PubMed WorldCat Fields EC . 2017 . Factorial Mass Univariate ERP Toolbox [Computer Software] . https://github.com/ericcfields/FMUT/releases. Fields EC , Kuperberg GR. 2020 . Having your cake and eating it too: flexibility and power with mass univariate statistics for ERP data . Psychophysiology . 57 : e13468 . Google Scholar Crossref Search ADS PubMed WorldCat Fort A , Besle J, Giard MH, Pernier J. 2005 . Task-dependent activation latency in human visual extrastriate cortex . Neurosci Lett . 379 : 144 – 148 . Google Scholar Crossref Search ADS PubMed WorldCat Fort A , Delpuech C, Pernier J, Giard MH. 2002 . Dynamics of cortico-subcortical cross-modal operations involved in audio-visual object detection in humans . Cereb Cortex . 12 : 1031 – 1039 . Google Scholar Crossref Search ADS PubMed WorldCat Gao Y , Li Q, Yang W, Yang J, Tang X, Wu J. 2014 . Effects of ipsilateral and bilateral auditory stimuli on audiovisual integration: a behavioral and event-related potential study . Neuroreport . 25 : 668 – 675 . Google Scholar Crossref Search ADS PubMed WorldCat Giard MH , Peronnet F. 1999 . Auditory-visual integration during multimodal object recognition in humans: a behavioral and electrophysiological study . J Cogn Neurosci . 11 : 473 – 490 . Google Scholar Crossref Search ADS PubMed WorldCat Groppe DM , Urbach TP, Kutas M. 2011a . Mass univariate analysis of event-related brain potentials/fields I: a critical tutorial review . Psychophysiology . 48 : 1711 – 1725 . Google Scholar Crossref Search ADS WorldCat Groppe DM , Urbach TP, Kutas M. 2011b . Mass univariate analysis of event-related brain potentials/fields II: simulation studies . Psychophysiology . 48 : 1726 – 1737 . Google Scholar Crossref Search ADS WorldCat Haroush K , Deouell LY, Hochstein S. 2011 . Hearing while blinking: multisensory attentional blink revisited . J Neurosci . 31 : 922 – 927 . Google Scholar Crossref Search ADS PubMed WorldCat Hopf JM , Vogel E, Woodman G, Heinze HJ, Luck SJ. 2002 . Localizing visual discrimination processes in time and space . J Neurophysiol . 88 : 2088 – 2095 . Google Scholar Crossref Search ADS PubMed WorldCat Iordanescu L , Guzman-Martinez E, Grabowecky M, Suzuki S. 2008 . Characteristic sound facilitates visual search . Psychon Bull Rev . 15 : 548 – 554 . Google Scholar Crossref Search ADS PubMed WorldCat Joos E , Giersch A, Hecker L, Schipp J, Heinrich SP, Tebartz van Elst L, Kornmeier J. 2020 . Large EEG amplitude effects are highly similar across Necker cube, smiley, and abstract stimuli . PLoS One . 15 : e0232928 . Google Scholar Crossref Search ADS PubMed WorldCat Kang G , Chang W, Wang L, Wei P, Zhou X. 2018 . Reward enhances cross-modal conflict control in object categorization: electrophysiological evidence . Psychophysiology . e13214 . Google Scholar OpenURL Placeholder Text WorldCat Kanwisher NG . 1987 . Repetition blindness: type recognition without token individuation . Cognition . 27 : 117 – 143 . Google Scholar Crossref Search ADS PubMed WorldCat Kaya U , Kafaligonul H. 2019 . Cortical processes underlying the effects of static sound timing on perceived visual speed . Neuroimage . 199 : 194 – 205 . Google Scholar Crossref Search ADS PubMed WorldCat Koelewijn T , Van der Burg E, Bronkhorst AW, Theeuwes J. 2008 . Priming T2 in a visual and auditory attentional blink task . Percept Psychophys . 70 : 658 – 666 . Google Scholar Crossref Search ADS PubMed WorldCat Kranczioch C , Thorne JD. 2013 . Simultaneous and preceding sounds enhance rapid visual targets: evidence from the attentional blink . Adv Cogn Psychol . 9 : 130 – 142 . Google Scholar Crossref Search ADS PubMed WorldCat Kranczioch C , Debener S, Maye A, Engel AK. 2007 . Temporal dynamics of access to consciousness in the attentional blink . Neuroimage . 37 : 947 – 955 . Google Scholar Crossref Search ADS PubMed WorldCat Kranczioch C , Thorne JD. 2015 . The beneficial effects of sounds on attentional blink performance: an ERP study . Neuroimage . 117 : 429 – 438 . Google Scholar Crossref Search ADS PubMed WorldCat Kutas M , Federmeier KD. 2000 . Electrophysiology reveals semantic memory use in language comprehension . Trends Cogn Sci . 4 : 463 – 470 . Google Scholar Crossref Search ADS PubMed WorldCat Kutas M , Hillyard SA. 1980 . Reading senseless sentences: brain potentials reflect semantic incongruity . Science . 207 : 203 – 205 . Google Scholar Crossref Search ADS PubMed WorldCat Laurienti PJ , Kraft RA, Maldjian JA, Burdette JH, Wallace MT. 2004 . Semantic congruence is a critical factor in multisensory behavioral performance . Exp Brain Res . 158 : 405 – 414 . Google Scholar Crossref Search ADS PubMed WorldCat Lee WT , Kang MS. 2020 . Electrophysiological evidence for distinct proactive control mechanisms in a stop-signal task: an individual differences approach . Front Psychol . 11 : 1105 . Google Scholar Crossref Search ADS PubMed WorldCat Lopez-Calderon J , Luck SJ. 2014 . ERPLAB: an opensource toolbox for the analysis of event-related potentials . Front Hum Neurosci . 8 : 213 . Google Scholar Crossref Search ADS PubMed WorldCat Luck SJ . 2014 . An introduction to the event-related potential technique . 2nd ed. Cambridge (MA) : MIT Press . Google Scholar Google Preview OpenURL Placeholder Text WorldCat COPAC Luck SJ , Gaspelin N. 2017 . How to get statistically significant effects in any ERP experiment (and why you shouldn't) . Psychophysiology . 54 : 146 – 157 . Google Scholar Crossref Search ADS PubMed WorldCat Luo W , Feng W, He W, Wang N, Luo Y. 2010 . Three stages of facial expression processing: ERP study with rapid serial visual presentation . Neuroimage . 49 : 1857 – 1867 . Google Scholar Crossref Search ADS PubMed WorldCat Luo W , He W, Yang S, Feng W, Chen T, Wang L, Gan T, Luo Y. 2013 . Electrophysiological evidence of facial inversion with rapid serial visual presentation . Biol Psychol . 92 : 395 – 402 . Google Scholar Crossref Search ADS PubMed WorldCat Maier M , Rahman RA. 2018 . Native language promotes access to visual consciousness . Psychol Sci . 29 : 1757 – 1772 . Google Scholar Crossref Search ADS PubMed WorldCat Mangun GR . 1995 . Neural mechanisms of visual selective attention . Psychophysiology . 32 : 4 – 18 . Google Scholar Crossref Search ADS PubMed WorldCat McDonald JJ , Störmer VS, Martínez A, Feng W, Hillyard SA. 2013 . Salient sounds activate human visual cortex automatically . J Neurosci . 33 : 9194 – 9201 . Google Scholar Crossref Search ADS PubMed WorldCat McDonald JJ , Teder-Sälejärvi WA, Di Russo F, Hillyard SA. 2003 . Neural substrates of perceptual enhancement by cross-modal spatial attention . J Cogn Neurosci . 15 : 10 – 19 . Google Scholar Crossref Search ADS PubMed WorldCat Michail G , Keil J. 2018 . High cognitive load enhances the susceptibility to non-speech audiovisual illusions . Sci Rep . 8 : 11530 . Google Scholar Crossref Search ADS PubMed WorldCat Mishra J , Gazzaley A. 2012 . Attention distributed across sensory modalities enhances perceptual performance . J Neurosci . 32 : 12294 – 12302 . Google Scholar Crossref Search ADS PubMed WorldCat Mishra J , Gazzaley A. 2013 . Preserved discrimination performance and neural processing during crossmodal attention in aging . PLoS One . 8 : e81894 . Google Scholar Crossref Search ADS PubMed WorldCat Mishra J , Martínez A, Hillyard SA. 2008 . Cortical processes underlying sound-induced flash fusion . Brain Res . 1242 : 102 – 115 . Google Scholar Crossref Search ADS PubMed WorldCat Mishra J , Martínez A, Hillyard SA. 2010 . Effect of attention on early cortical processes associated with the sound-induced extra flash illusion . J Cogn Neurosci . 22 : 1714 – 1729 . Google Scholar Crossref Search ADS PubMed WorldCat Mishra J , Martínez A, Sejnowski T, Hillyard SA. 2007 . Early cross-modal interactions in auditory and visual cortex underlie a sound-induced visual illusion . J Neurosci . 27 : 4120 – 4131 . Google Scholar Crossref Search ADS PubMed WorldCat Molholm S , Martinez A, Shpaner M, Foxe JJ. 2007 . Object-based attention is multisensory: co-activation of an object's representations in ignored sensory modalities . Eur J Neurosci . 26 : 499 – 509 . Google Scholar Crossref Search ADS PubMed WorldCat Molholm S , Murphy JW, Bates J, Ridgway EM, Foxe JJ. 2020 . Multisensory audiovisual processing in children with a sensory processing disorder (I): behavioral and electrophysiological indices under speeded response conditions . Front Integr Neurosci . 14 : 4 . Google Scholar Crossref Search ADS PubMed WorldCat Molholm S , Ritter W, Javitt DC, Foxe JJ. 2004 . Multisensory visual-auditory object recognition in humans: a high-density electrical mapping study . Cereb Cortex . 14 : 452 – 465 . Google Scholar Crossref Search ADS PubMed WorldCat Molholm S , Ritter W, Murray MM, Javitt DC, Schroeder CE, Foxe JJ. 2002 . Multisensory auditory-visual interactions during early sensory processing in humans: a high-density electrical mapping study . Cogn Brain Res . 14 : 115 – 128 . Google Scholar Crossref Search ADS WorldCat Olivers CNL , Van der Burg E. 2008 . Bleeping you out of the blink: sound saves vision from oblivion . Brain Res . 1242 : 191 – 199 . Google Scholar Crossref Search ADS PubMed WorldCat Olivers CNL , van der Stigchel S, Hulleman J. 2007 . Spreading the sparing: against a limited-capacity account of the attentional blink . Psychol Res . 71 : 126 – 139 . Google Scholar Crossref Search ADS PubMed WorldCat Pierce AM , McDonald JJ, Green JJ. 2018 . Electrophysiological evidence of an attentional bias in crossmodal inhibition of return . Neuropsychologia . 114 : 11 – 18 . Google Scholar Crossref Search ADS PubMed WorldCat Raij T , Ahveninen J, Lin FH, Witzel T, Jääskeläinen IP, Letham B, Israeli E, Sahyoun C, Vasios C, Stufflebeam S, et al. 2010 . Onset timing of cross-sensory activations and multisensory interactions in auditory and visual sensory cortices . Eur J Neurosci . 31 : 1772 – 1782 . Google Scholar Crossref Search ADS PubMed WorldCat Raymond JE , Shapiro KL, Arnell KM. 1992 . Temporary suppression of visual processing in an RSVP task: an attentional blink? J Exp Psychol Hum Percept Perform . 18 : 849 – 860 . Google Scholar Crossref Search ADS PubMed WorldCat Senkowski D , Saint-Amour D, Höfle M, Foxe JJ. 2011 . Multisensory interactions in early evoked brain activity follow the principle of inverse effectiveness . Neuroimage . 56 : 2200 – 2208 . Google Scholar Crossref Search ADS PubMed WorldCat Sergent C , Baillet S, Dehaene S. 2005 . Timing of the brain events underlying access to consciousness during the attentional blink . Nat Neurosci . 8 : 1391 – 1400 . Google Scholar Crossref Search ADS PubMed WorldCat Shapiro KL , Caldwell J, Sorensen RE. 1997 . Personal names and the attentional blink: a visual “cocktail party” effect . J Exp Psychol Hum Percept Perform . 23 : 504 – 514 . Google Scholar Crossref Search ADS PubMed WorldCat Shapiro KL , Raymond JE. 1994 . Temporal allocation of visual attention: inhibition or interference? In: Dagenbach D, Carr TH, editors. Inhibitory processes in attention, memory, and language . San Diego (CA) : Academic Press , pp. 151 – 188 . Google Scholar Google Preview OpenURL Placeholder Text WorldCat COPAC Sinke C , Neufeld J, Wiswede D, Emrich HM, Bleich S, Münte TF, Szycik GR. 2014 . N1 enhancement in synesthesia during visual and audio-visual perception in semantic cross-modal conflict situations: an ERP study . Front Hum Neurosci . 8 : 21 . Google Scholar Crossref Search ADS PubMed WorldCat Störmer VS , Feng W, Martínez A, McDonald JJ, Hillyard SA. 2016 . Salient, irrelevant sounds reflexively induce alpha rhythm desynchronization in parallel with slow potential shifts in visual cortex . J Cogn Neurosci . 28 : 433 – 445 . Google Scholar Crossref Search ADS PubMed WorldCat Suied C , Bonneel N, Viaud-Delmon I. 2009 . Integration of auditory and visual information in the recognition of realistic objects . Exp Brain Res . 194 : 91 – 102 . Google Scholar Crossref Search ADS PubMed WorldCat Talsma D . 2015 . Predictive coding and multisensory integration: an attentional account of the multisensory mind . Front Integr Neurosci . 9 : 19 . Google Scholar Crossref Search ADS PubMed WorldCat Talsma D , Doty TJ, Woldorff MG. 2007 . Selective attention and audiovisual integration: is attending to both modalities a prerequisite for early integration? Cereb Cortex . 17 : 679 – 690 . Google Scholar Crossref Search ADS PubMed WorldCat Talsma D , Senkowski D, Soto-Faraco S, Woldorff MG. 2010 . The multifaceted interplay between attention and multisensory integration . Trends Cogn Sci . 14 : 400 – 410 . Google Scholar Crossref Search ADS PubMed WorldCat Talsma D , Woldorff MG. 2005 . Selective attention and multisensory integration: multiple phases of effects on the evoked brain activity . J Cogn Neurosci . 17 : 1098 – 1114 . Google Scholar Crossref Search ADS PubMed WorldCat Tang X , Wu J, Shen Y. 2016 . The interactions of multisensory integration with endogenous and exogenous attention . Neurosci Biobehav Rev . 61 : 208 – 224 . Google Scholar Crossref Search ADS PubMed WorldCat Tanaka J , Luu P, Weisbrod M, Kiefer M. 1999 . Tracking the time course of object categorization using event-related potentials . Neuroreport . 10 : 829 – 835 . Google Scholar Crossref Search ADS PubMed WorldCat Teder-Sälejärvi WA , Di Russo F, McDonald JJ, Hillyard SA. 2005 . Effects of spatial congruity on audio-visual multimodal integration . J Cogn Neurosci . 17 : 1396 – 1409 . Google Scholar Crossref Search ADS PubMed WorldCat Teder-Sälejärvi WA , McDonald JJ, Di Russo F, Hillyard SA. 2002 . An analysis of audio-visual crossmodal integration by means of event-related potential (ERP) recordings . Cogn Brain Res . 14 : 106 – 114 . Google Scholar Crossref Search ADS WorldCat Tse CY , Gratton G, Garnsey SM, Novak MA, Fabiani M. 2015 . Read my lips: brain dynamics associated with audiovisual integration and deviance detection . J Cogn Neurosci . 27 : 1723 – 1737 . Google Scholar Crossref Search ADS PubMed WorldCat Van der Burg E , Olivers CNL, Bronkhorst AW, Theeuwes J. 2008 . Pip and pop: non-spatial auditory signals improve spatial visual search . J Exp Psychol Hum Percept Perform . 34 : 1053 – 1065 . Google Scholar Crossref Search ADS PubMed WorldCat Van der Burg E , Talsma D, Olivers CNL, Hickey C, Theeuwes J. 2011 . Early multisensory interactions affect the competition among multiple visual objects . Neuroimage . 55 : 1208 – 1218 . Google Scholar Crossref Search ADS PubMed WorldCat Van Vleet TM , Robertson LC. 2006 . Cross-modal interactions in time and space: auditory influence on visual attention in hemispatial neglect . J Cogn Neurosci . 18 : 1368 – 1379 . Google Scholar Crossref Search ADS PubMed WorldCat Vogel EK , Luck SJ. 2000 . The visual N1 component as an index of a discrimination process . Psychophysiology . 37 : 190 – 203 . Google Scholar Crossref Search ADS PubMed WorldCat Vogel EK , Luck SJ, Shapiro KL. 1998 . Electrophysiological evidence for a postperceptual locus of suppression during the attentional blink . J Exp Psychol Hum Percept Perform . 24 : 1656 – 1674 . Google Scholar Crossref Search ADS PubMed WorldCat Volosin M , Horváth J. 2020 . Task difficulty modulates voluntary attention allocation, but not distraction in an auditory distraction paradigm . Brain Res . 1727 : 146565 . Google Scholar Crossref Search ADS PubMed WorldCat Weissman DH , Warner LM, Woldorff MG. 2004 . The neural mechanisms for minimizing cross-modal distraction . J Neurosci . 24 : 10941 – 10949 . Google Scholar Crossref Search ADS PubMed WorldCat Weissman DH , Warner LM, Woldorff MG. 2009 . Momentary reductions of attention permit greater processing of irrelevant stimuli . Neuroimage . 48 : 609 – 615 . Google Scholar Crossref Search ADS PubMed WorldCat Wu J , Li Q, Bai O, Touge T. 2009 . Multisensory interactions elicited by audiovisual stimuli presented peripherally in a visual attention task: a behavioral and event-related potential study in humans . J Clin Neurophysiol . 26 : 407 – 413 . Google Scholar Crossref Search ADS PubMed WorldCat Yang W , Li Q, Ochi T, Yang J, Gao Y, Tang X, Takahashi S, Wu J. 2013 . Effects of auditory stimuli in the horizontal plane on audiovisual integration: an event-related potential study . PLoS One . 8 : e66402 . Google Scholar Crossref Search ADS PubMed WorldCat Yuval-Greenberg S , Deouell LY. 2007 . What you see is not (always) what you hear: induced gamma band responses reflect cross-modal interactions in familiar object recognition . J Neurosci . 27 : 1090 – 1096 . Google Scholar Crossref Search ADS PubMed WorldCat Yuval-Greenberg S , Deouell LY. 2009 . The dog’s meow: asymmetrical interaction in cross-modal object recognition . Exp Brain Res . 193 : 603 – 614 . Google Scholar Crossref Search ADS PubMed WorldCat Zhao S , Wang Y, Xu H, Feng C, Feng W. 2018 . Early cross-modal interactions underlie the audiovisual bounce-inducing effect . Neuroimage . 174 : 208 – 218 . Google Scholar Crossref Search ADS PubMed WorldCat Zhao S , Wang Y, Feng C, Feng W. 2020 . Multiple phases of cross-sensory interactions associated with the audiovisual bounce-inducing effect . Biol Psychol . 149 : 107805 . Google Scholar Crossref Search ADS PubMed WorldCat Zimmer U , Itthipanyanan S, Grent-‘t-Jong T, Woldorff MG. 2010 . The electrophysiological time course of the interaction of stimulus conflict and the multisensory spread of attention . Eur J Neurosci . 31 : 1744 – 1754 . Google Scholar Crossref Search ADS PubMed WorldCat © The Author(s) 2020. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com This article is published and distributed under the terms of the Oxford University Press, Standard Journals Publication Model (https://academic.oup.com/journals/pages/open_access/funder_policies/chorus/standard_publication_model) TI - Neural Basis of Semantically Dependent and Independent Cross-Modal Boosts on the Attentional Blink JF - Cerebral Cortex DO - 10.1093/cercor/bhaa362 DA - 2020-12-07 UR - https://www.deepdyve.com/lp/oxford-university-press/neural-basis-of-semantically-dependent-and-independent-cross-modal-sf0Iek0E50 SP - 1 EP - 1 VL - Advance Article IS - DP - DeepDyve ER -