TY - JOUR AU - Naya,, Yuji AB - Abstract Perceptual processing along the ventral visual pathway to the hippocampus (HPC) is hypothesized to be substantiated by signal transformation from retinotopic space to relational space, which represents interrelations among constituent visual elements. However, our visual perception necessarily reflects the first person’s perspective based on the retinotopic space. To investigate this two-facedness of visual perception, we compared neural activities in the temporal lobe (anterior inferotemporal cortex, perirhinal and parahippocampal cortices, and HPC) between when monkeys gazed on an object and when they fixated on the screen center with an object in their peripheral vision. We found that in addition to the spatially invariant object signal, the temporal lobe areas automatically represent a large-scale background image, which specify the subject’s viewing location. These results suggest that a combination of two distinct visual signals on relational space and retinotopic space may provide the first person’s perspective serving for perception and presumably subsequent episodic memory. macaque monkey, medial temporal lobe, figure-ground segmentation, relational space, retinotopic space Introduction Visual information of our external world could be once decomposed into “what” and “where” before we attained its mental representation as the first person’s perspective (Tulving 2002; Eichenbaum et al. 2007; Palombo et al. 2015). For several decades, it has been considered that the perception of these two visual features proceeds exclusively through the ventral and dorsal pathways (Mishkin and Ungerleider 1982; Haxby et al. 1991; Goodale and Milner 1992). Instead of this widespread dichotomy, contemporary visual neuroscience research suggests a presence of spatial information in the ventral pathway for perception (Schenk 2010; Epstein and Julian 2013; Kornblith et al. 2013; Freud et al. 2016; Hong et al. 2016; Connor and Knierim 2017; Mormann et al. 2017; Chen and Naya 2020). For instance, neurons in the inferotemporal (IT) cortex (TEO and TEd) of nonhuman primates exhibited preferential responses to scene-like stimuli rather than object-like stimuli (Kornblith et al. 2013; Vaziri et al. 2014). The response pattern of scene-selective IT neurons may be comparable with an activation pattern in the parahippocampal place area detected in human functional imaging studies (Epstein and Kanwisher 1998; Julian and Epstein 2013). The parahippocampal place area is located within the parahippocampal cortex (PHC) of the medial temporal lobe (MTL), which receives inputs from the early stages of the ventral pathway including the TEO and posterior TEd in addition to inputs from the dorsal pathway, and provides spatial information to the hippocampus (HPC)—a candidate of the final brain region for scene perception (Burgess 2008)—via the medial/posterior entorhinal cortex (ERC) (Killian et al. 2015; Meister and Buffalo 2018; Rolls 2018). On the other hand, neurons in the IT cortex also represent location information of an object within a scene either at population-coding level (Hong et al. 2016) or at single-neuron level (Chen and Naya 2020). It is worth noting that while most neurophysiological studies had shown a spatial invariance of object at single-neuron level during monkeys’ fixating on the center of a display under either the passive-viewing task (Kobatake and Tanaka 1994; Hong et al. 2016) or the delayed matching-to-sample (object) task (Miyashita and Chang 1988; Nakamura et al. 1994), our recent study demonstrated equivalent or even more neurons exhibiting location signal compared with object signal in the ventral part of the anterior inferotemporal cortex (TEv) and its downstream MTL area (e.g., perirhinal cortex, PRC) during an item-location retention (ILR) task requiring monkeys to encode both identity and location of a sample object using a foveal vision (Fig. 1). Importantly, the location-selective activity during the ILR task could not be explained by the animals’ eye-positions themselves (Chen and Naya 2020). Figure 1 Open in new tabDownload slide Encoding of location and item in two view conditions. (A) Schematic diagram of location and item encoding in the F-V and P-V conditions of the active-encoding and passive-encoding tasks. In the active-encoding task, the cue stimulus was the same as the sample stimulus during the encoding phase in the match trial (Top), while the two stimuli differed in the nonmatch trial (Bottom). Red circles indicate correct answers. Passive-encoding task consisted of only the encoding phase of the active-encoding task. (B) Example of coronal sections from monkey A and monkey F. The sections from Monkey A are 16 and 10.5 mm anterior to the interaural line and include the HPC, PHC, perirhinal cortex (PRC), and area TE (TE). amts, anterior middle temporal sulcus; ots, occipital temporal sulcus; rs, rhinal sulcus. Coronal sections from monkey F are 19.2 and 8.4 mm anterior to the interaural line. (C) Six object stimuli were used in the task, and an example of spatial composition during the sample period is shown. A yellow disk indicates an object position. (D) Schematic diagram of visual inputs to the retinae during the sample period; white dashed lines indicate the horizontal and vertical meridians of the visual field. Figure 1 Open in new tabDownload slide Encoding of location and item in two view conditions. (A) Schematic diagram of location and item encoding in the F-V and P-V conditions of the active-encoding and passive-encoding tasks. In the active-encoding task, the cue stimulus was the same as the sample stimulus during the encoding phase in the match trial (Top), while the two stimuli differed in the nonmatch trial (Bottom). Red circles indicate correct answers. Passive-encoding task consisted of only the encoding phase of the active-encoding task. (B) Example of coronal sections from monkey A and monkey F. The sections from Monkey A are 16 and 10.5 mm anterior to the interaural line and include the HPC, PHC, perirhinal cortex (PRC), and area TE (TE). amts, anterior middle temporal sulcus; ots, occipital temporal sulcus; rs, rhinal sulcus. Coronal sections from monkey F are 19.2 and 8.4 mm anterior to the interaural line. (C) Six object stimuli were used in the task, and an example of spatial composition during the sample period is shown. A yellow disk indicates an object position. (D) Schematic diagram of visual inputs to the retinae during the sample period; white dashed lines indicate the horizontal and vertical meridians of the visual field. Considering that different gaze positions cause a substantial difference in the large-scale visual input in the ILR task using the foveal-view (F-V) condition (Fig. 1D), the most straightforward explanation for the robust location signal might be that a substantial number of neurons in the IT cortex and MTL areas are driven by the retinotopic signal including parafoveal vision, which would not only serve for recognizing a scene (Dilks et al. 2011; Kornblith et al. 2013; Vaziri et al. 2014; Connor and Knierim 2017) but also signal a particular location in the scene (Hong et al. 2016; Meister and Buffalo 2018; Chen and Naya 2020). An alternative explanation would be the location information of an object is coded into internal spatial relationships within a large complex stimulus including an object and its background regardless of their absolute retinotopic positions. In other words, the IT cortex and MTL areas would represent object location by transforming representations of the object and its background on the retinotopic space (Zhaoping 2019) into those on the “relational space” (Connor and Knierim 2017). In this case, the location signal in the ILR task would be sensitive to the task demand requiring the animals to retain the object location for a following action rather than a retinotopic image depending on the animals’ gaze position. To address this question and investigate characteristics of spatial information in the ventral pathway and its downstream (i.e., MTL areas), we examined single-unit activities and local-field potentials (LFPs) from the TEv and MTL subregions during an object stimulus presented randomly at one of the quadrants on the display in a peripheral-view (P-V) as well as in the F-V condition (Fig. 1). In the P-V condition, animals were required to fixate on a central dot and obtain the location and item-identity information of the sample object using their peripheral vision (Fig. 1A). We compared the location effects between the two-view conditions by testing two rhesus macaques, and found that regardless of the task demands for encoding of an object and its location, there were much more abundant location signal in the F-V condition compared with the P-V condition on all the recording regions of the two monkeys. These results suggest that the retinotopic space representation is present for the object location in the ventral pathway, which spreads over the MTL areas. Materials and Methods Subjects Two male monkeys (Macaca mulatta) (9.3 kg, monkey A; 10.1 kg, monkey F) were used for the experiments. All procedures and treatments were performed in accordance with the NIH Guide for the Care and Use of Laboratory Animals and were approved by the Institutional Animal Care and Use Committee (IACUC) of Peking University. Behavioral Task We trained monkey A on the F-V condition of an active-encoding task with six visual items (Fig. 1). During both training and recording sessions, monkeys performed the task under dim light in an electromagnetic shielded room (length × width × height = 160 × 120 × 222 cm). The task began with an encoding phase, which was initiated by the animal pulling a lever and fixating on a white square (0.6° of visual angle) presented within one of the four quadrants (12.5° from the center) of a touch screen (3MTM MicroTouchTM Display M1700SS, 17 inch, horizontal viewing angle: ~59°, vertical viewing angle: ~49°) with a custom-made metal frame (diagonal size: 22 inch, horizontal viewing angle: ~72°, vertical viewing angle: ~71°) situated ~28 cm from the subjects. Eye position was monitored using an infrared digital camera with a 120 Hz sampling frequency (ETL-200, ISCAN) placed next to the left edge of the touch screen. The eye position calibration was conducted before starting each recording session (Monkey logic). After a 0.6 s fixation, one of the six items (3.0°, radius) was presented in the same quadrant as a sample stimulus for 0.3 s, followed by another 0.7 s fixation on the white square. An additional 0.017 s, reflecting the design of software and hardware controlling the behavioral task was added to each trial event. If the fixation was successfully maintained (typically, <2.5°), the encoding phase ended with the presentation of a single drop of water. The encoding phase was followed by a blank interphase delay interval of 0.7–1.4 s during which no fixation was required. Then, the response phase was initiated with a fixation dot presented at the center of the screen. One of the six items was then presented at the center for 0.3 s as a cue stimulus. After another 0.5 s delay period, five disks were presented as choices, including a blue disk in each quadrant and a green disk at the center. When the cue stimulus was the same as the sample stimulus, the subject was required to choose by touching the blue disk in the same quadrant as the sample (i.e., match condition). Otherwise, the subject was required to choose the green disk (i.e., nonmatch condition). If the animal made the correct choice, four to eight drops of water were given as a reward; otherwise, an additional 4 s was added to the standard intertrial interval (1.5–3 s). During the trial, a large gray square (48° on each side, RGB value: 50, 50, 50, luminance: 3.36 cd/m2) was presented at the center of the display (backlight luminance: 0.22 cd/m2) as a background. After the end of a trial, all stimuli disappeared and the entire screen displayed light-red color during the intertrial interval. The start of a new trial was indicated by the reappearance of the large gray square on the display, upon which the monkey could start to pull the lever triggering an appearance of a white fixation dot. In the match condition, sample stimuli were pseudorandomly chosen from six well-learned visual items, and each item was presented pseudorandomly within the four quadrants, resulting in 24 (6 × 4) different configuration patterns. In the nonmatch condition, the position of the sample stimulus was randomly chosen from the four quadrants, and the cue stimulus was randomly chosen from the five items that differed from the sample stimulus. The match and nonmatch conditions were randomly presented at a ratio of 4:1, resulting in 30 (24 + 6) different configuration patterns. The same six stimuli were used during all recording sessions. In addition to the F-V condition, we tested the neuronal responses of monkey A in the P-V condition of the active-encoding task. In this view condition, fixation on the center of the display was required during the encoding phase (Fig. 1). Other parameters were the same as those in the F-V condition of the active-encoding task. Correct performance under the F-V condition: 97.5 ± 2.6% in the match trials and 90.8 ± 8.1% in the nonmatch trials (n = 454 sessions); P-V condition: 94.3 ± 6.2% in the match trials and 84.1 ± 10.8% in the nonmatch trials (n = 478 sessions). We tested the neuronal responses of monkey F in both F-V and P-V condition of a passive-encoding task, in which the task sequence and requirement were same as the encoding phase of the active-encoding task but without a lever-pulling requirement (no interphase delay interval and response phase). The configuration of visual stimuli (such as visual angles, configuration patterns, and others) was same as that for monkey A. We tested the neuronal response of both monkey A and monkey F in the F-V and P-V conditions in a block manner. Electrophysiological Recording Following initial behavioral training, animals were implanted with a head post and recording chamber under aseptic conditions using isoflurane anesthesia. To record single-unit activity, we used a 16-channel vector array microprobe (V1 X 16-Edge, NeuroNexus), 16-channel U-Probe (Plexon), tungsten tetrode probe (Thomas RECORDING), or a single-wire tungsten microelectrode (Alpha Omega), which was advanced into the brain using a hydraulic Microdrive (MO-97A, Narishige) (Naya and Suzuki 2011). The microelectrode was inserted through a stainless steel guide tube positioned in a customized grid system on the recording chamber. Neural signals for single units were collected (low-pass, 6 kHz; high-pass, 200 Hz) and digitized (40 kHz) (OmniPlex Neural Data Acquisition System, Plexon). These signals were then sorted using an offline sorter provided by the OmniPlex system. We did not attempt to prescreen isolated neurons. Instead, once we isolated any neuron, we started to record its activity. The location of microelectrodes in target areas was guided by individual brain atlases from MRI scans (3T, Siemens). We also constructed individual brain atlases based on the electrophysiological properties around the tip of the electrode (e.g., gray matter, white matter, sulcus, lateral ventricle, and bottom of the brain). The recording sites were estimated by combining the individual MRI atlases and physiological atlases (Matsui et al. 2007; Naya et al. 2017). To record LFPs, we used neural signals from the same electrodes as we used for the recording of spikes. However, the signals were collected using different filters (low-pass, 200 Hz; high-pass, 0.05 Hz), and digitized at 1 kHz. The recording sites in monkey A covered an area between 5 and 24 mm anterior to the interaural line (right hemisphere). The recording sites in monkey F covered an area between 6.6 and 23.4 mm anterior to the interaural line (right hemisphere). The recording sites in HPC appeared to cover all its subdivisions (i.e., dentate gyrus, CA3, CA1, and subicular complex). The recording sites in PHC focused on approximately the lateral 2/3. The recording sites in PRC appeared to cover areas 35 and 36 from the fundus of the rhinal sulcus to the medial lip of the anterior middle temporal sulcus (amts). The border of PRC’s caudal limit (PHC’s rostral limit) was determined according to the rostral limit of the occipital temporal sulcus and the caudal limit of the rhinal sulcus (Suzuki and Amaral 2003). In monkey A, the caudal limit of the recording sites in PRC is 2 mm posterior to the caudal limit of its rhinal sulcus and 1 mm anterior to the rostral limit of the occipital temporal sulcus. In monkey F, the caudal limit of the recording sites in PRC is 0 mm posterior to the caudal limit of its rhinal sulcus and 0 mm anterior to the rostral limit of the occipital temporal sulcus. The recording sites in TEv were limited to its ventral area, including both banks of the amts. Data Analysis All neuronal data were analyzed using MATLAB (MathWorks) with custom written programs, including the statistics toolbox. For responses during/after sample presentation, the firing rate during the period extending from 80 to 1000 ms after sample onset was tested. For sample responses, we evaluated the effects of “location,” “item,” and their “interaction” for each neuron using two-way ANOVA with interactions (P < 0.01 for each). We analyzed neurons that we tested in at least 60 trials (10 trials for each stimulus, 15 trials for each location) in each view condition. Representational Similarity Analyses We examined the location and item effects coded by response patterns of all recorded neurons in each area using the “representational similarity analyses” (RSA) (Kriegeskorte et al. 2008; Zhang and Naya 2020). We first calculated mean firing rates during the period extending from 80 to 1000 ms after sample onset in each of 24 trial types (six items × four locations) for each neuron and then standardized the 24 values using z-transformation. We then constructed 24 of n-dimensional population vectors consisting of the z-scores of all recorded neurons in each area. “n” indicates a number of the recorded neurons in each area and each task. By calculating correlation coefficients (Pearson’s linear correlation coefficient) between the population vectors, we estimated the similarity between the trial types in each area. After constructing the correlation matrices (Fig. 5), we collected the correlation coefficients of the trial-type pairs with “the same locations & different items” (“same,” 60 pairs) and those with “the different locations & different items” (“different,” 180 pairs). An average of z-scores of the correlation coefficients between the “same” trial-type pairs was subtracted by that between the “different” trial-type pairs to estimate the location effect. Finally, we examined an influence of the view conditions on the location effect by testing an interaction effect of two-way ANOVA with the trial-types (“same” vs. “different”) and the view conditions (“F-V” vs. “P-V”) as main factors. P values were corrected by Bonferroni correction among the four brain regions. For the evaluation of the item effect, the “same” trial-type pairs had the same items and different locations (36 pairs), while the “different” trial-type pairs had different items and different locations (180 pairs). LFP Analysis For the spectral analysis of the LFPs, we used the multitapering method in the CHRONUX toolbox developed by P. Mitra at Cold Spring Harbor Laboratories (Jarvis and Mitra 2001; Pesaran et al. 2002). In the multitaper method, we first calculated spectra estimates for individual tapers as Fourier transforms of the data multiplied by them. We next averaged the tapered Fourier transforms. Using five Slepian tapers, we estimated LFP spectrum on a 300 ms window (10 Hz resolution) stepping at 50 ms intervals (Naya and Suzuki 2011). The power of LFP spectrum was scaled by dB. The LFP activity in each frequency at each time point was subtracted by a baseline activity (mean, −600 to 0 ms from the sample onset) in the same frequency for each recording site in order to examine the changes of LFP activity after the sample onset. We compared the beta (12–25 Hz) and gamma (30–80 Hz) frequency band activities between the F-V and P-V conditions. We first calculated mean values of LFP activities for each recording site, from pixels centered on 12–25 Hz for the beta band and from 30 to 80 Hz for gamma band. We then averaged the values along the time axis during the early (0–300 ms after the sample onset) and the late (350–800 ms after the sample onset) sample periods. Finally, we tested if these values differed between the two view conditions as population in each brain regions using two-tailed paired t-test. P values were corrected by Bonferroni correction among the four brain regions. Results We collected data in both F-V and P-V conditions from two rhesus macaques (Fig. 1B). During the recording, Monkey A was required to encode an identity of a sample stimulus and its location actively for a subsequent response (i.e., ILR task). We reported the single-unit data in the F-V condition of the ILR task in the previous study (Chen and Naya 2020); here, we refer to the ILR task as an “active-encoding task.” On the other hand, monkey F was only required to fixate on a small white dot, viewing a sample stimulus passively (“passive-encoding task”) in both view conditions (Fig. 1A,B). We did not record from the single monkeys in both encoding tasks because it would be difficult to exclude both explicit and implicit influences of learning the active-encoding task on the cognitive process in the passive-encoding task. We used the same six visual objects (yellow Chinese characters, radius = 3°) as sample stimuli for both monkeys through all the recording sessions (Fig. 1C). It should be noted that the retinotopic images differed entirely between F-V and P-V conditions although a position of a sample stimulus was identical relative to the external world including a large square background (48° each side) on the display between the two view conditions (Fig. 1C,D). This two-by-two experimental design (“F-V vs. P-V” × “active-encoding vs. passive-encoding”) allowed us to compare the neural signals in the F-V condition with those in the P-V condition in the animals with different task demands. Monkey A performed the active-encoding task at high performances in both F-V (96.2 ± 3.7%, 454 sessions) and P-V (92.2 ± 7.1%, 477 sessions) conditions. Figure 2 Open in new tabDownload slide Responses of the location-selective and item-selective cells in the active-encoding and passive-encoding task. (A) Example of the location-selective cells from TE in the F-V and P-V condition of the active-encoding task. (Left) Spike-density functions (SDFs) (sigma = 20 ms) indicating the firing rates under two conditions (best location and the average of other three locations). Lines and shadings indicate the mean and standard errors across trials, respectively. Open triangle indicates the start of the fixation acquisition, which is 0.6 s before the start of sample presentation. Horizontal yellow bars indicate the presentation of the sample stimulus (0.3 s) (see Fig. 1A). (Right) Bar graph indicating the mean firing rate during sample period (0.08–1.0 s after sample on) under each location and each item. (B) Example of the location-selective cells from PRC in the F-V and P-V conditions of the active-encoding task. (C,D) Examples of the location-selective cells in TE (C) and PHC (D) in the F-V and P-V conditions of the passive-encoding task. (E,F) Example of the item-selective cells from in TE (E) and PRC (F) in the F-V and P-V conditions of the active-encoding task. (G,H) Examples of the item-selective cells in TE (G) and HPC (H) in the F-V and P-V conditions of the passive-encoding task. *P < 0.01, **P < 0.001, ***P < 0.0001. Two-way ANOVA with interaction for each cell and view condition. Figure 2 Open in new tabDownload slide Responses of the location-selective and item-selective cells in the active-encoding and passive-encoding task. (A) Example of the location-selective cells from TE in the F-V and P-V condition of the active-encoding task. (Left) Spike-density functions (SDFs) (sigma = 20 ms) indicating the firing rates under two conditions (best location and the average of other three locations). Lines and shadings indicate the mean and standard errors across trials, respectively. Open triangle indicates the start of the fixation acquisition, which is 0.6 s before the start of sample presentation. Horizontal yellow bars indicate the presentation of the sample stimulus (0.3 s) (see Fig. 1A). (Right) Bar graph indicating the mean firing rate during sample period (0.08–1.0 s after sample on) under each location and each item. (B) Example of the location-selective cells from PRC in the F-V and P-V conditions of the active-encoding task. (C,D) Examples of the location-selective cells in TE (C) and PHC (D) in the F-V and P-V conditions of the passive-encoding task. (E,F) Example of the item-selective cells from in TE (E) and PRC (F) in the F-V and P-V conditions of the active-encoding task. (G,H) Examples of the item-selective cells in TE (G) and HPC (H) in the F-V and P-V conditions of the passive-encoding task. *P < 0.01, **P < 0.001, ***P < 0.0001. Two-way ANOVA with interaction for each cell and view condition. Gaze-Related Location Signal We first investigated single-unit activities signaling location information. Figure 2A shows an example of TEv neurons that were recorded in the active-encoding task. The neuron showed the largest responses when the animal fixated on the position I (top right on the large square background). Although the responses once decayed, the neuron responded strongly when an item stimulus was presented as a sample stimulus at the same position I in the F-V condition. We examined the neuronal responses during 80–1000 ms after the onset of sample presentation (sample period) using a two-way ANOVA with item identities (six items) and locations (four locations) as main effects. The neuron showed a significant location effect (P < 0.0001, F(3,156) = 18.98) but not for item identities of sample stimuli (P = 0.309, F(5,156) = 1.21). In contrast to the strong location-selectivity in the F-V condition, the same TEv neuron did not show location-selective activities in the P-V condition during the sample period (P = 0.183, F(3,157) = 1.63). Figure 2B shows an example of PRC neurons that also exhibited location-selective activities only in the F-V conditions. This neuron signaled location information only after sample presentation in the F-V condition, suggesting that the presence of location signal in the F-V condition cannot be necessarily explained by preceding location-selective activity before sample presentation. We examined the prevalence of location signal in the two view conditions among the recording regions by calculating proportions of neurons with significant (P < 0.01, two-way ANOVA) location-selective activities during the sample period in each area. All recording regions contained significantly (P < 0.0001 in TE, PRC and HPC, P = 0.0011 PHC, χ2 test, Bonferroni correction) larger proportions of location-selective cells in the F-V condition (26%, TE; 29%, PRC; 21%, HPC; 19%, PHC) than the P-V condition (7%, TE; 10%, PRC; 7%, HPC; 5%, PHC) (Fig. 3A, top). These results indicated that the location information in the ventral pathway and MTL areas were sensitive to the view conditions, although the same task-relevant information was required for a following action in the active-encoding task. The robust location signal only in the F-V condition implicates that the temporal lobe areas represent a visual image, which subjects view rather than the goal-directed spatial information related with an action plan. Figure 3 Open in new tabDownload slide Proportions of location-selective and item-selective cells. (A) Proportions of location-selective cells (Top), item-selective cells (Middle), and interaction cells (Bottom) during the sample period (80–1000 ms after sample on) in the F-V (Filled bars) and P-V (Open bars) conditions in the active-encoding task. Numbers of recorded neurons (tested in both view conditions) are indicated in parentheses. **P = 0.0011, χ2 = 13.24, d.f. = 1 for PHC. ***P < 0.0001, χ2 = 20.67, 20.01, and 30.82 for TE, PRC, and HPC, respectively. P values were corrected by Bonferroni corrections among the four recording regions. (B) Proportions of selective cells in the passive-encoding task. *P = 0.014, χ2 = 8.66 for PHC. **P = 0.0034. χ2 = 11.14 for TE. ***P < 0.0001, χ2 = 20.31 for HPC. Figure 3 Open in new tabDownload slide Proportions of location-selective and item-selective cells. (A) Proportions of location-selective cells (Top), item-selective cells (Middle), and interaction cells (Bottom) during the sample period (80–1000 ms after sample on) in the F-V (Filled bars) and P-V (Open bars) conditions in the active-encoding task. Numbers of recorded neurons (tested in both view conditions) are indicated in parentheses. **P = 0.0011, χ2 = 13.24, d.f. = 1 for PHC. ***P < 0.0001, χ2 = 20.67, 20.01, and 30.82 for TE, PRC, and HPC, respectively. P values were corrected by Bonferroni corrections among the four recording regions. (B) Proportions of selective cells in the passive-encoding task. *P = 0.014, χ2 = 8.66 for PHC. **P = 0.0034. χ2 = 11.14 for TE. ***P < 0.0001, χ2 = 20.31 for HPC. The different sensitivity to the two view conditions was also observed for the location signal in the passive-encoding task (Fig. 2C,D). Similar to the active-encoding task, we found a substantial number of neurons exhibiting location effect (31%, TE; 16%, PRC; 23%, HPC; 33%, PHC) under the F-V condition (Fig. 3B, top). This result indicates that the location-selective response in the active-encoding task did not result from the task requirement, in which the animal was required to maintain actively a location of a sample stimulus. Compared with the F-V condition, the number of location-selective cells decreased dramatically under the P-V condition in all areas (8%, TE; 4%, PRC; 0%, HPC; 11%, PHC) (Fig. 3B, top). These results are also consistent with the single-unit results in the active-encoding task, and suggest that the gaze-sensitive location signal is automatically encoded by neurons in the TEv and MTL. The marked reduction of location signal in the P-V condition during either active or passive-encoding task argued against the possibility that the location-selective cells distinguish the structural organization of large objects with internal structures (e.g., a large gray square with a small letter at its top-left vs. at its bottom-right), which would be represented by the relational rather than the retinotopic space (Connor and Knierim 2017). The most straight-forward interpretation of fewer active location-selective cells under the P-V condition may be that fixating on the center of the display reduces attention to a sample stimulus and attenuates the response of location-selective cells, which showed robust location signals in the F-V condition. If this situation applies, we would then expect that neurons with stronger location selectivity in the F-V condition would show relatively stronger location selectivity in the P-V condition (i.e., a positive correlation). To test this possibility, we estimated strengths of location signals for neurons with location-selective activity in either F-V or P-V condition using F values indicating a location effect in the two-way ANOVA. Notably, we observed a negative correlation in amplitudes of the F values between the conditions in all areas during either active-encoding (Spearman rank correlation = −0.24 among 229 neurons across areas, P = 0.0003, two-tailed) (Fig. 4A,D) or passive-encoding task (Spearman rank correlation = −0.20 among 71 neurons, P = 0.090, two-tailed) (Fig. 4B,D). These results suggest that the weak location signal in the P-V condition was not due to the attenuated attention to a sample item. A reasonable interpretation of the negatively correlated location signal might be that separate visual inputs on the retinae drive different ensembles of neurons between the two view conditions (Fig. 1D). This interpretation is consistent with the significant reduction in the proportion of location-selective cells from the F-V to the P-V condition (Fig. 3) because a retinotopic shift of a large background square (48°, each side, Fig. 1C) in the F-V condition (Fig. 1D, left) would drive more neurons than that of a small sample stimulus (3°, radius) in the P-V condition (Fig. 1D, right). Collectively, the TEv and MTL areas may automatically signal large-scale background information represented on the retinotopic space, which necessarily reflects a perspective that a subject is viewing. Figure 4 Open in new tabDownload slide Location and item signal intensity between the two view conditions. (A) Location effect of the location-selective cells in the F-V and P-V conditions of the active-encoding task. F values in the P-V condition are plotted against those in the F-V condition for location-selective cells in either of the two view conditions. Neurons showing significant effects in either of the two conditions were used for the calculation of the F values. Numbers of the location-selective cells used for final calculation in each region are indicated in parentheses. (B) Location effect in the F-V and P-V conditions of the passive-encoding task. (C) Item effect of the item-selective cells in the two view conditions of the active-encoding task. The axis ranges in A–C were adjusted for display purpose, which included majorities of the data sets (A: 97.8%, B: 98.6%, C: 99.6%). (D) Correlation of the signal intensity between the two view conditions. Data from MTL and TEv were merged in the active-encoding and passive-encoding tasks, respectively. The total numbers of location-selective and item-selective cells used for final calculation are indicated in parentheses (left and right, respectively). na, not applicable. ρ = −0.24 and −0.20, Spearman’s rank correlation, **P = 0.0003 and 0.09, d.f. = 227 and 69, two-tailed for the location effect in the active- and passive-encoding tasks, respectively. ρ = 0.32, ***P < 0.0001, d.f. = 241 for the item effect in the active-encoding. Figure 4 Open in new tabDownload slide Location and item signal intensity between the two view conditions. (A) Location effect of the location-selective cells in the F-V and P-V conditions of the active-encoding task. F values in the P-V condition are plotted against those in the F-V condition for location-selective cells in either of the two view conditions. Neurons showing significant effects in either of the two conditions were used for the calculation of the F values. Numbers of the location-selective cells used for final calculation in each region are indicated in parentheses. (B) Location effect in the F-V and P-V conditions of the passive-encoding task. (C) Item effect of the item-selective cells in the two view conditions of the active-encoding task. The axis ranges in A–C were adjusted for display purpose, which included majorities of the data sets (A: 97.8%, B: 98.6%, C: 99.6%). (D) Correlation of the signal intensity between the two view conditions. Data from MTL and TEv were merged in the active-encoding and passive-encoding tasks, respectively. The total numbers of location-selective and item-selective cells used for final calculation are indicated in parentheses (left and right, respectively). na, not applicable. ρ = −0.24 and −0.20, Spearman’s rank correlation, **P = 0.0003 and 0.09, d.f. = 227 and 69, two-tailed for the location effect in the active- and passive-encoding tasks, respectively. ρ = 0.32, ***P < 0.0001, d.f. = 241 for the item effect in the active-encoding. Task-Dependent Item Signal In contrast to the dramatic difference in the location-selective activity between the F-V and P-V conditions, neurons in the temporal lobe showed consistent item-selective responses between the two view conditions during the active-encoding task (Fig. 2E,F). In all recording regions except for the PHC, we found a substantial number of item-selective cells under the P-V condition (TE 21%, PRC 23%, HPC 29%, and PHC 4%) as well as F-V condition (TE 15%, PRC 23%, HPC 32%, and PHC 2%) (Fig. 3A, middle). These results are consistent with previous studies indicating the spatial invariance of object representation (Miyashita and Chang 1988; Kobatake and Tanaka 1994; Nakamura et al. 1994), which would be obtained by transforming it from the retinotopic space into the relational space along the ventral pathway (Connor and Knierim 2017). In contrast to the location signal, the signal strengths of item information positively correlated between the F-V and P-V conditions (Fig. 4C,D). These results indicate distinct processing between the item and its background (i.e., location signal) regarding their sensitivity to the view conditions. Interestingly, the number of item-selective cells was negligible in all areas under both view conditions in the passive-encoding task (F-V condition: TE 6%, PRC 0%, HPC 3%, PHC 2%; P-V condition: TE 2%, PRC 2%, HPC 1% PHC 4%; Figs 2G,H and 3B, middle), which contrasts to the substantial number of item-selective cells in the active-encoding task. The inconsistency in the item signal between the two tasks suggests that the object representation depends on the task demand, which required the subject to maintain an item identity of a sample stimulus for the following action. In addition to the location and item as main effects, we examined proportions of cells with significant (P < 0.01 for each cell, two-way ANOVA) interaction effects between the two effects. The number of interaction cells was negligible overall, but relatively larger number of cells showed the interaction effect in the TEv (6%), PRC (4%), and HPC (7%) under the F-V condition in the active-encoding task, in which both location and item signals were abundant (Fig. 3). Population-Coding Analysis The analyses based on the spike-firing data of individual neurons indicated substantially stronger location signal in the F-V condition compared with the P-V condition regardless of the task demands. One remaining question might be whether the location signal could be represented equivalently between the two view conditions by population coding. To test this possibility, we conducted the RSA (Kriegeskorte et al. 2008); we first constructed a population vector consisting of firing rates of all recorded neurons in each area as its elements. In each combination of view condition and encoding-type, there were 24 (six items × four locations) of n-dimensional population vectors. “n” indicates a number of the recorded neurons in each area. We then calculated correlation coefficients between the population vectors, indicating the similarity level of neural representations between trial-types with different item-location combinations. Figure 5A,B displayed the similarity level of neural representations in the HPC during the sample presentation period in the active-encoding and passive-encoding tasks, respectively. The representational similarities between trial-types with same locations (e.g., location 1 item 1 and location 1 item 2) were substantially larger than the similarities between trial-types with different locations (e.g., location 1 item 1 and location 2 item 2) in the F-V condition in both the active-encoding (z = 0.14, “same” minus “different”) and the passive-encoding (z = 0.23) tasks, suggesting that the HPC represents the item location that the animals were viewing, regardless of the task demands. Compared with the F-V condition, the HPC’s discriminability in the location of a sample stimulus was considerably diminished in the P-V condition in both the active-encoding (z = 0.04) and the passive-encoding (z = 0.02) tasks. We tested this tendency by examining an interaction effect between “same versus different” and “F-V versus P-V”. All recorded regions including the HPC showed the marked reduction of the location signal in the P-V condition compared with the F-V condition in both tasks (P < 0.0001 for each area in each task, two-way ANOVA, Bonferroni correction) (Fig. 5C,D). Together, consistent with the analyses based on the single neurons, the analyses examining the population coding suggest that the temporal lobe areas represent the location information more robustly in the F-V condition than the P-V condition. As to the item signal, the RSA also provided the results which were consistent with the results of the single-neuron-based analyses (Fig. 5E–H). Figure 5 Open in new tabDownload slide Location and item effects in RSA. (A) Correlation matrices across 24 (four locations × six items) population vectors in the HPC under the F-V (left) and P-V (right) conditions of the active-encoding task. A length of the population vector was the number of all recorded neurons from the HPC in the active-encoding task for both conditions (n = 365). The population vectors were sorted according to the location as a main category and the item as a subcategory to show the location effect. The same item pairs (blue pixels) were eliminated from the analysis for the location effect. (B) The same formats as (A) but for the location effect in the passive-encoding task. (C) Means of the correlation coefficients (z values) under the same location pairs, which were subtracted by those under the different location pairs, in the F-V (filled bars) and P-V (open bars) conditions for each recording region. Parentheses, numbers of recorded neurons. ***P < 0.0001, F(1,476) = 41.73, 77.74, 64.91, and 72.68 for TE, PRC, HPC, and PHC, respectively (an interaction effect of two-way ANOVA, Bonferroni correction among the four recording regions). (D) The same formats as (C) but for the location effect in the passive-encoding task. ***P < 0.0001, F(1,476) = 52.14, 22.82, 78.33, and 46.00 for TE, PRC, HPC, and PHC, respectively. (E–H) The same formats as (A–D) but for the item effect. Figure 5 Open in new tabDownload slide Location and item effects in RSA. (A) Correlation matrices across 24 (four locations × six items) population vectors in the HPC under the F-V (left) and P-V (right) conditions of the active-encoding task. A length of the population vector was the number of all recorded neurons from the HPC in the active-encoding task for both conditions (n = 365). The population vectors were sorted according to the location as a main category and the item as a subcategory to show the location effect. The same item pairs (blue pixels) were eliminated from the analysis for the location effect. (B) The same formats as (A) but for the location effect in the passive-encoding task. (C) Means of the correlation coefficients (z values) under the same location pairs, which were subtracted by those under the different location pairs, in the F-V (filled bars) and P-V (open bars) conditions for each recording region. Parentheses, numbers of recorded neurons. ***P < 0.0001, F(1,476) = 41.73, 77.74, 64.91, and 72.68 for TE, PRC, HPC, and PHC, respectively (an interaction effect of two-way ANOVA, Bonferroni correction among the four recording regions). (D) The same formats as (C) but for the location effect in the passive-encoding task. ***P < 0.0001, F(1,476) = 52.14, 22.82, 78.33, and 46.00 for TE, PRC, HPC, and PHC, respectively. (E–H) The same formats as (A–D) but for the item effect. Location and Item Signals along the Rostro-Caudal Axis of the Hippocampus We compared distributions of the item and location signals between the rostral and caudal parts of the HPC (Table 1). The single-neuron-based analysis showed that more location-selective cells were distributed in the caudal part of the HPC under the F-V condition in either active- or passive-encoding task while the item-selective neurons did not show any difference in their distribution along the rostro-caudal axis of the HPC. The same tendencies were found in the population-coding analyses using the RSA. Table 1 Location and item encoding patterns along the rostral-caudal axis of HPC . . F-V condition . P-V condition . Active R / C . Passive R / C . Active R / C . Passive R / C . Location Proportion 0.16 / 0.24 0.05 / 0.42 0.07 / 0.07 0.00 / 0.00 Correlation 0.11 / 0.15 0. 11 / 0.37 0.03 / 0.06 0.01 / 0.03 Item Proportion 0.31 / 0.33 0.00 / 0.05 0.27 / 0.30 0.02 / 0.00 Correlation 0.22 / 0.22 −0.02 / 0.02 0.19 / 0.28 0.04 / 0.00 . . F-V condition . P-V condition . Active R / C . Passive R / C . Active R / C . Passive R / C . Location Proportion 0.16 / 0.24 0.05 / 0.42 0.07 / 0.07 0.00 / 0.00 Correlation 0.11 / 0.15 0. 11 / 0.37 0.03 / 0.06 0.01 / 0.03 Item Proportion 0.31 / 0.33 0.00 / 0.05 0.27 / 0.30 0.02 / 0.00 Correlation 0.22 / 0.22 −0.02 / 0.02 0.19 / 0.28 0.04 / 0.00 Active, active-encoding task (monkey A): rostral part (n = 147, 12–19 mm anterior to the interaural line) and caudal part (n = 218, 5–11 mm anterior to the interaural line). Passive, passive-encoding task (monkey F): rostral part (n = 41, 14.6–20.6 mm anterior to the interaural line) and caudal part (n = 38, 6.6–13.6 mm anterior to the interaural line). Proportion, ratio of significantly (P < 0.01, two-way ANOVA) selective neurons out of the recorded neurons in each part of the HPC. Correlation, difference in the representational similarity (z-score) between the same and different location/item pairs (see Fig. 5). R, rostral. C, caudal. Open in new tab Table 1 Location and item encoding patterns along the rostral-caudal axis of HPC . . F-V condition . P-V condition . Active R / C . Passive R / C . Active R / C . Passive R / C . Location Proportion 0.16 / 0.24 0.05 / 0.42 0.07 / 0.07 0.00 / 0.00 Correlation 0.11 / 0.15 0. 11 / 0.37 0.03 / 0.06 0.01 / 0.03 Item Proportion 0.31 / 0.33 0.00 / 0.05 0.27 / 0.30 0.02 / 0.00 Correlation 0.22 / 0.22 −0.02 / 0.02 0.19 / 0.28 0.04 / 0.00 . . F-V condition . P-V condition . Active R / C . Passive R / C . Active R / C . Passive R / C . Location Proportion 0.16 / 0.24 0.05 / 0.42 0.07 / 0.07 0.00 / 0.00 Correlation 0.11 / 0.15 0. 11 / 0.37 0.03 / 0.06 0.01 / 0.03 Item Proportion 0.31 / 0.33 0.00 / 0.05 0.27 / 0.30 0.02 / 0.00 Correlation 0.22 / 0.22 −0.02 / 0.02 0.19 / 0.28 0.04 / 0.00 Active, active-encoding task (monkey A): rostral part (n = 147, 12–19 mm anterior to the interaural line) and caudal part (n = 218, 5–11 mm anterior to the interaural line). Passive, passive-encoding task (monkey F): rostral part (n = 41, 14.6–20.6 mm anterior to the interaural line) and caudal part (n = 38, 6.6–13.6 mm anterior to the interaural line). Proportion, ratio of significantly (P < 0.01, two-way ANOVA) selective neurons out of the recorded neurons in each part of the HPC. Correlation, difference in the representational similarity (z-score) between the same and different location/item pairs (see Fig. 5). R, rostral. C, caudal. Open in new tab LFP Activity Depending on Both View-Condition and Task-Demand In addition to spiking data, we investigated the LFP activity during the sample period. Figure 6A shows the differential spectrograms between the viewing conditions (F-V condition minus P-V condition) in each recording region under the active-encoding task (left column) and passive-encoding task (right column). During the early sample presentation period (0–300 ms after sample onset), a beta-band activity (12–25 Hz) was enhanced nonselectively across the brain regions and tasks (Fig. 6A,B). This higher beta-band activity in the F-V condition is consistent with preceding literature indicating that larger beta-band activity is observed when the current cognitive or perceptual status should be actively maintained (i.e., the sample stimulus appears at the same position as with the fixation period in the F-V condition) than when the current state is disrupted by an unexpected event (i.e., the sample stimulus appears randomly at one out of the four positions in the P-V condition) (Engel and Fries 2010). A view-condition dependent LFP activity was also observed in a gamma-band (30–80 Hz) during the late sample presentation period (350–800 ms after sample onset) (Fig. 6A). In contrast to the widely distributed beta-band, the gamma-band activity was selectively expressed only in the PRC and HPC when a sample item and its location were encoded actively by the foveal vision (Fig. 6B), in which situation both the item and location signals appeared robustly in these brain regions (Figs 3 and 5). These results may implicate that the increased gamma-band activity is related with an integration of the item and location signals to construct their conjunctional representations, which reportedly occurs in the PRC and HPC but not in TEv nor PHC (i.e., Type 2 integration in Chen and Naya 2020). Figure 6 Open in new tabDownload slide The differential LFP spectrograms between the F-V and P-V conditions in the active-encoding and passive-encoding tasks. (A) Population LFP spectrograms in each recording area during the active-encoding (left) and passive-encoding (right) tasks. Color in each time-frequency pixel indicates a differential power between the F-V and P-V conditions. Open triangles, the start of the fixation acquisition. Horizontal yellow bars, the presentation of the sample stimulus. n, number of recording sites in each area in each task. Early, early sample period. Late, late sample period. (B) The average beta-band (12–25 Hz) and gamma-band (30–80 Hz) intensities (F-V minus P-V) during early sample (0.0–0.30 s after sample onset) and late sample (0.35–0.80 s after sample onset) period. *P < 0.05, **P < 0.005, ***P < 0.0001, Paired t-test, two-tails, Bonferroni correction among the four recording regions. Figure 6 Open in new tabDownload slide The differential LFP spectrograms between the F-V and P-V conditions in the active-encoding and passive-encoding tasks. (A) Population LFP spectrograms in each recording area during the active-encoding (left) and passive-encoding (right) tasks. Color in each time-frequency pixel indicates a differential power between the F-V and P-V conditions. Open triangles, the start of the fixation acquisition. Horizontal yellow bars, the presentation of the sample stimulus. n, number of recording sites in each area in each task. Early, early sample period. Late, late sample period. (B) The average beta-band (12–25 Hz) and gamma-band (30–80 Hz) intensities (F-V minus P-V) during early sample (0.0–0.30 s after sample onset) and late sample (0.35–0.80 s after sample onset) period. *P < 0.05, **P < 0.005, ***P < 0.0001, Paired t-test, two-tails, Bonferroni correction among the four recording regions. Discussion The present study provides single-unit data showing robust spatial information in the TEv and MTL areas, which signaled a particular location where the animals were viewing (F-V condition) rather than an object position presented in the peripheral view (P-V condition). These results were shown for each of the recording regions by the independent analyses for each of the two monkeys, indicating the very robust animal consistency. In addition, this animal consistency was confirmed even though the two animals were tested in different task demands (i.e., active-encoding and passive-encoding of an object and its location), which manifests the robustness of the present findings showing an existence of the location signal characterized by the clear difference in its sensitivity to the two view conditions. These new findings suggest that the location signal in the primate temporal lobe areas may represent a view-centered background image, which could specify the current gaze position within a scene (Fig. 7). This view-centered background may be automatically represented in the temporal lobe areas because it was observed in the passive-encoding task as well as the active-encoding task. The TEv and MTL areas except for the PHC also signaled object information. However, in contrast to the background information, the object information was represented regardless of the view conditions when it was actively encoded. These results from the single-neuron-based analyses were confirmed by population-coding analyses. Taken together, the present study suggests that the ventral pathway and its downstream in the MTL signal view-centered background information represented on the retinotopic space, which may automatically locate the object in a scene when it is viewed by the foveal vision. Figure 7 Open in new tabDownload slide Parallel scene processing on the retinotopic and relational spaces. (Top) Assume that a subject was in wheat field and viewing the valley. An eagle was in parafoveal vision of the subject in view A, while it was in the subject’s foveal vision in view B. (Middle) When the subject attended the eagle either voluntarily or involuntarily, the eagle would be selected as an object from the retinotopic image and processed on the relational space (right) regardless of its original retinotopic. Conversely, background images would be automatically captured and processed on the retinotopic space, which specify a location of the view point in the scene accordingly (left). (Bottom) The location of the object in the scene would be assigned by an associated information between the view-centered background and the object. This model hypothesizes that first person’s perspective of a scene containing objects depends on the parallel visual processing on the retinotopic and relational spaces, and their association. The original painting is titled Wheat Field with Cypresses by Vincent Willem van Gogh. Figure 7 Open in new tabDownload slide Parallel scene processing on the retinotopic and relational spaces. (Top) Assume that a subject was in wheat field and viewing the valley. An eagle was in parafoveal vision of the subject in view A, while it was in the subject’s foveal vision in view B. (Middle) When the subject attended the eagle either voluntarily or involuntarily, the eagle would be selected as an object from the retinotopic image and processed on the relational space (right) regardless of its original retinotopic. Conversely, background images would be automatically captured and processed on the retinotopic space, which specify a location of the view point in the scene accordingly (left). (Bottom) The location of the object in the scene would be assigned by an associated information between the view-centered background and the object. This model hypothesizes that first person’s perspective of a scene containing objects depends on the parallel visual processing on the retinotopic and relational spaces, and their association. The original painting is titled Wheat Field with Cypresses by Vincent Willem van Gogh. One naïve question on the gaze-related location signal might be whether the location signal could be explained by nonvisual sensory/motor information, which reflects the animals’ eye positions relative to their heads. Our previous study indicated that neurons in the TEv and MTL areas responded differently to the same gaze positions depending on the position of the large background square within the display (leftward or rightward) (Chen and Naya 2020), suggesting that the gaze-related location signal reflects visual inputs rather than somatosensory/motor-related information of the gaze itself. In the present study, we characterized this location signal, which were widely distributed over the temporal lobe areas, by revealing the underlying visual inputs. We changed the visual inputs on the retinotopic space by presenting an object stimulus, whose spatial configurations in an environment (e.g., large background square) was same on the relational space in the F-V and P-V conditions. If neurons signaled the location information of an object equivalently between the two view conditions, it would suggest that the neurons signaled an interrelation between the object and any other spatial structure in the environment such as a large gray square behind it, which would support the relational space for the representation of the object location information. Conversely, the present study showed the different amplitudes of the location signal between the two view conditions in each of the areas (Figs 3 and 5). In addition, the individual neurons exhibited the negatively correlated location signal (Fig. 4) not only in the passing-encoding task but also in the active-encoding task in which the same object location information was required explicitly for the following response (Fig. 1). These results may reject the relational space model and suggest the retinotopic space model for representing the object location which could be substantiated by the view-centered background while a subject views an object in the foveal vision. An important question about the view-centered background information on the retinotopic space might be whether it only reflects the parafoveal vision or not. In the present study, the location-selective activity depends on the parafoveal vision of the background, which shows an edge of the large gray square or the display frame. However, some neurons exhibited location-selective activities only after sample presentation in the F-V condition (Fig. 2B–D) (10.8 and 8.5% across areas in the active and passive-encoding tasks), which suggest an existence of neuronal population that represent the view-centered background including foveal vision as well as parafoveal vision. The view-centered background signal in the present study may explain response patterns of “spatial view cells” in the HPC (and posterior PRC) reported by Rolls et al. (1997). The spatial view cells show selective responses to a particular location where an animal views regardless of its standing position. This allocentric coding property of the spatial view cells could be due to similar visual inputs when an animal views the same location from different positions. In spite of the location signal which may reflect the background information on the retinotopic space, the object signal was detected regardless of its retinotopic position in the active encoding task (Fig. 7), which confirmed the preceding literature showing the spatial invariant of object representation in the IT cortex (Miyashita and Chang 1988; Nakamura et al. 1994). The representation of an object may be explained by a spatial relationship among the internal elements of it, which necessarily accompany its transformation from the retinotopic space into the relational space (Connor and Knierim 2017). The present study suggests that neurons in the temporal lobe signal the location information of an object as its background image represented on the retinotopic space (Fig. 7). Based on the present experimental set up, the background image encoded by neurons in the TEv and MTL areas should cover larger than 30 degrees in the visual angle (diameter) to include the edge of the large gray square background, which may cause different responses according to the gaze positions. As well as the object signal, the large-scale background image is reportedly processed along the ventral pathway (Kornblith et al. 2013; Vaziri et al. 2014). One remaining question is whether the processing of the background image in the ventral pathway imparts more generalized spatial features (e.g., field, valley, forest), which may be represented on the relational space and serve for recognizing an entire scene (e.g., suburb rather than modern city) regardless of the gaze positions. In addition to the view conditions testing the representation spaces (i.e., relational vs. retinotopic), the object and the background signals showed differential sensitivity patterns to the task demands in the present study. The background signal was encoded irrespective of the task demand while the object signal was encoded only in the active-encoding task. The automatic encoding of the background signal suggests that when we direct our gaze toward an object to obtain its high-resolution image, we would spontaneously receive the spatial information, which would be assigned to the object (Chen and Naya 2020). One remaining problem about the object signal might be whether the lack of item-selective activity in the passive-encoding task is due to the present stimulus set (i.e., Chinese character) because the IT neurons reportedly respond to object stimuli such as face stimuli in a passive-viewing task (Tsao et al. 2003; Kiani et al. 2007). Compared with a natural object such as a face stimulus, a fabricated 2D stimuli used in the present study may not bring about a bottom-up attention to be perceived as an object. In the active-encoding task, the monkey learned the Chinese characters to discriminate one from another (Nakahara et al. 2016). The repetitive training in the active-encoding task might form a long-term learning effect on the stimulus to induce the bottom-up attention, which may lead a transformation of representations of Chinese-characters from the retinotopic space into the relational space. Although we cannot address if the attention was derived from the bottom-up or the top-down, the attention-dependent object signal and the attention-independent background signal may derive from a figure-background segmentation, which reportedly occurred at the V4, a start point of the ventral pathway (Roe Anna et al. 2012). Previous studies have focused on the object information which is filtered, and implicated that the object representation is transformed from the retinotopic space into the relational space with the increase of neurons’ receptive fields along the ventral pathway (Connor and Knierim 2017). We hypothesize that the background information, which is filtered-out at the figure-ground segmentation, spreads into the ventral pathway with its representation remaining on the retinotopic space rather than the relational space. Our previous report has demonstrated that the two distinct signals, which are segmented from the same retinal image, are integrated step-by-step from the TEv, PRC to HPC (Chen and Naya 2020). From the ventral stream to the MTL areas, the strongest integration effect was found in the PRC at the single neurons level. This integration process may be related with the largest gamma-band LFP activity in the PRC, which was observed when the monkey gazed at an object to encode its identity and location information actively (Fig. 6). In the present study, the PHC represented the view-centered background signal whose property is similar to that in the TEv and other MTL areas including the PRC. Conversely, the PHC did not carry the item signal compared with the other recording regions, which was consistent with previous neurophysiological studies (Sato and Nakamura 2003; Chen and Naya 2020). Instead of the object representation, the PHC has been considered to process spatial/context information (Eichenbaum et al. 2007; Ranganath and Ritchey 2012). Considering the heavier projections from the posterior parietal cortex to the PHC compared with the AIT cortex including the PRC (Kravitz et al. 2011), the PHC may process the spatial information related with the eye/self-movement in addition to the view-centered background. The two types of spatial information (i.e., view-centered background and eye/self-position) may be combined in the PHC and the integrated information might transmit to the HPC to construct an environmental scene via the ERC in which neurons reportedly show eye-position effects (Killian et al. 2015; Killian and Buffalo 2018; Meister and Buffalo 2018). Contributions of the PHC to scene construction process may become apparent when subjects perceive the environment by moving their gazes (Zhang and Naya 2020) in which multiple views should be coordinated according to the eye/self-movements, beyond encoding a single snapshot focusing on one object which was investigated in the present study. We propose a future study to investigate how the past multiple views influence on the present view to build the current first person’s perspective (Tulving 2002; Eichenbaum et al. 2007; Palombo et al. 2015), which may be related with an encoding of episodic memory. Notes We thank E.T. Rolls, W.A. Suzuki, M. Zhang, K.W. Koyano, C. Yang for helpful comments, S. Xue for expert animal care, and the National Center for Protein Sciences at Peking University for assistance in acquisitions of structural magnetic resonance images. Conflict of Interest: The authors declare no competing financial interests. Funding National Natural Science Foundation of China (grant 31421003, 31871139 to Y.N.). Author Contributions Y.N. designed the experiments. H.C. performed the experiments. H.C. and Y.N. analyzed data and wrote the manuscript. References Burgess N . 2008 . Spatial cognition and the brain . Ann N Y Acad Sci . 1124 ( 1 ): 77 – 97 . Google Scholar Crossref Search ADS PubMed WorldCat Chen H , Naya Y. 2020 . Forward processing of object-location association from the ventral stream to medial temporal lobe in nonhuman primates . Cereb Cortex . 30 ( 3 ): 1260 – 1271 . Google Scholar Crossref Search ADS PubMed WorldCat Connor CE , Knierim JJ. 2017 . Integration of objects and space in perception and memory . Nat Neurosci . 20 ( 11 ): 1493 – 1503 . Google Scholar Crossref Search ADS PubMed WorldCat Dilks DD , Julian JB, Kubilius J, Spelke ES, Kanwisher N. 2011 . Mirror-image sensitivity and invariance in object and scene processing pathways . J Neurosci . 31 ( 31 ): 11305 – 11312 . Google Scholar Crossref Search ADS PubMed WorldCat Eichenbaum H , Yonelinas AP, Ranganath C. 2007 . The medial temporal lobe and recognition memory . Annu Rev Neurosci . 30 : 123 – 152 . Google Scholar Crossref Search ADS PubMed WorldCat Engel AK , Fries P. 2010 . Beta-band oscillations--signalling the status quo? Curr Opin Neurobiol . 20 ( 2 ): 156 – 165 . Google Scholar Crossref Search ADS PubMed WorldCat Epstein R , Kanwisher N. 1998 . The parahippocampal place area: a cortical representation of the local visual environment . NeuroImage . 7 ( 4 ): S341 – S341 . Google Scholar Crossref Search ADS WorldCat Epstein RA , Julian JB. 2013 . Scene areas in humans and macaques . Neuron . 79 ( 4 ): 615 – 617 . Google Scholar Crossref Search ADS PubMed WorldCat Freud E , Plaut DC, Behrmann M. 2016 . ‘What’ is happening in the dorsal visual pathway . Trends Cogn Sci . 20 ( 10 ): 773 – 784 . Google Scholar Crossref Search ADS PubMed WorldCat Goodale MA , Milner AD. 1992 . Separate visual pathways for perception and action . Trends Neurosci . 15 ( 1 ): 20 – 25 . Google Scholar Crossref Search ADS PubMed WorldCat Haxby JV , Grady CL, Horwitz B, Ungerleider LG, Mishkin M, Carson RE, Herscovitch P, Schapiro MB, Rapoport SI. 1991 . Dissociation of object and spatial visual processing pathways in human extrastriate cortex . Proc Natl Acad Sci U S A . 88 ( 5 ): 1621 – 1625 . Google Scholar Crossref Search ADS PubMed WorldCat Hong H , Yamins DL, Majaj NJ, DiCarlo JJ. 2016 . Explicit information for category-orthogonal object properties increases along the ventral stream . Nat Neurosci . 19 ( 4 ): 613 – 622 . Google Scholar Crossref Search ADS PubMed WorldCat Jarvis MR , Mitra PP. 2001 . Sampling properties of the spectrum and coherency of sequences of action potentials . Neural Comput . 13 ( 4 ): 717 – 749 . Google Scholar Crossref Search ADS PubMed WorldCat Julian J , Epstein R. 2013 . The landmark expansion effect: navigational relevance influences memory of object size . J Vis . 13 ( 9 ): 49 – 49 . Google Scholar Crossref Search ADS WorldCat Kiani R , Esteky H, Mirpour K, Tanaka K. 2007 . Object category structure in response patterns of neuronal population in monkey inferior temporal cortex . J Neurophysiol . 97 ( 6 ): 4296 – 4309 . Google Scholar Crossref Search ADS PubMed WorldCat Killian NJ , Buffalo EA. 2018 . Grid cells map the visual world . Nat Neurosci . 21 ( 2 ): 161 – 162 . Google Scholar Crossref Search ADS PubMed WorldCat Killian NJ , Potter SM, Buffalo EA. 2015 . Saccade direction encoding in the primate entorhinal cortex during visual exploration . Proc Natl Acad Sci U S A . 112 ( 51 ): 15743 – 15748 . Google Scholar Crossref Search ADS PubMed WorldCat Kobatake E , Tanaka K. 1994 . Neuronal selectivities to complex object features in the ventral visual pathway of the macaque cerebral cortex . J Neurophysiol . 71 ( 3 ): 856 – 867 . Google Scholar Crossref Search ADS PubMed WorldCat Kornblith S , Cheng X, Ohayon S, Tsao DY. 2013 . A network for scene processing in the macaque temporal lobe . Neuron . 79 ( 4 ): 766 – 781 . Google Scholar Crossref Search ADS PubMed WorldCat Kravitz DJ , Saleem KS, Baker CI, Mishkin M. 2011 . A new neural framework for visuospatial processing . Nat Rev Neurosci . 12 ( 4 ): 217 – 230 . Google Scholar Crossref Search ADS PubMed WorldCat Kriegeskorte N , Mur M, Bandettini P. 2008 . Representational similarity analysis—connecting the branches of systems neuroscience . Front Syst Neurosci . 2 : 4 . Google Scholar Crossref Search ADS PubMed WorldCat Matsui T , Koyano KW, Koyama M, Nakahara K, Takeda M, Ohashi Y, Naya Y, Miyashita Y. 2007 . Mri-based localization of electrophysiological recording sites within the cerebral cortex at single-voxel accuracy . Nat Methods . 4 ( 2 ): 161 – 168 . Google Scholar Crossref Search ADS PubMed WorldCat Meister MLR , Buffalo EA. 2018 . Neurons in primate entorhinal cortex represent gaze position in multiple spatial reference frames . J Neurosci . 38 ( 10 ): 2430 – 2441 . Google Scholar Crossref Search ADS PubMed WorldCat Mishkin M , Ungerleider LG. 1982 . Contribution of striate inputs to the visuospatial functions of parieto-preoccipital cortex in monkeys . Behav Brain Res . 6 ( 1 ): 57 – 77 . Google Scholar Crossref Search ADS PubMed WorldCat Miyashita Y , Chang HS. 1988 . Neuronal correlate of pictorial short-term memory in the primate temporal cortex . Nature . 331 ( 6151 ): 68 – 70 . Google Scholar Crossref Search ADS PubMed WorldCat Mormann F , Kornblith S, Cerf M, Ison MJ, Kraskov A, Tran M, Knieling S, Quiroga RQ, Koch C, Fried I. 2017 . Scene-selective coding by single neurons in the human parahippocampal cortex . Proc Natl Acad Sci U S A . 114 ( 5 ): 1153 – 1158 . Google Scholar Crossref Search ADS PubMed WorldCat Nakahara K , Adachi K, Kawasaki K, Matsuo T, Sawahata H, Majima K, Takeda M, Sugiyama S, Nakata R, Iijima A et al. 2016 . Associative-memory representations emerge as shared spatial patterns of theta activity spanning the primate temporal cortex . Nat Commun . 7 ( 1 ): 11827 . Google Scholar Crossref Search ADS PubMed WorldCat Nakamura K , Matsumoto K, Mikami A, Kubota K. 1994 . Visual response properties of single neurons in the temporal pole of behaving monkeys . J Neurophysiol . 71 ( 3 ): 1206 – 1221 . Google Scholar Crossref Search ADS PubMed WorldCat Naya Y , Chen H, Yang C, Suzuki WA. 2017 . Contributions of primate prefrontal cortex and medial temporal lobe to temporal-order memory . Proc Natl Acad Sci U S A . 114 ( 51 ): 13555 – 13560 . Google Scholar Crossref Search ADS PubMed WorldCat Naya Y , Suzuki WA. 2011 . Integrating what and when across the primate medial temporal lobe . Science . 333 ( 6043 ): 773 – 776 . Google Scholar Crossref Search ADS PubMed WorldCat Palombo DJ , Alain C, Söderlund H, Khuu W, Levine B. 2015 . Institutionen för p, Humanistisk-samhällsvetenskapliga v, Uppsala u, Samhällsvetenskapliga f ( Severely deficient autobiographical memory (sdam) in healthy adults: a new mnemonic syndrome) . Neuropsychologia . 72 : 105 – 118 . Google Scholar Crossref Search ADS PubMed WorldCat Pesaran B , Pezaris JS, Sahani M, Mitra PP, Andersen RA. 2002 . Temporal structure in neuronal activity during working memory in macaque parietal cortex . Nat Neurosci . 5 ( 8 ): 805 – 811 . Google Scholar Crossref Search ADS PubMed WorldCat Ranganath C , Ritchey M. 2012 . Two cortical systems for memory-guided behaviour . Nat Rev Neurosci . 13 ( 10 ): 713 – 726 . Google Scholar Crossref Search ADS PubMed WorldCat Roe Anna W , Chelazzi L, Connor Charles E, Conway Bevil R, Fujita I, Gallant Jack L, Lu H, Vanduffel W. 2012 . Toward a unified theory of visual area v4 . Neuron . 74 ( 1 ): 12 – 29 . Google Scholar Crossref Search ADS PubMed WorldCat Rolls ET . 2018 . The storage and recall of memories in the hippocampo-cortical system . Cell Tissue Res . 373 ( 3 ): 577 – 604 . Google Scholar Crossref Search ADS PubMed WorldCat Rolls ET , Robertson RG, Georges-François P. 1997 . Spatial view cells in the primate hippocampus . Eur J Neurosci . 9 ( 8 ): 1789 – 1794 . Google Scholar Crossref Search ADS PubMed WorldCat Sato N , Nakamura K. 2003 . Visual response properties of neurons in the parahippocampal cortex of monkeys . J Neurophysiol . 90 ( 2 ): 876 – 886 . Google Scholar Crossref Search ADS PubMed WorldCat Schenk T . 2010 . Visuomotor robustness is based on integration not segregation . Vis Res . 50 ( 24 ): 2627 – 2632 . Google Scholar Crossref Search ADS PubMed WorldCat Suzuki WA , Amaral DG. 2003 . Perirhinal and parahippocampal cortices of the macaque monkey: Cytoarchitectonic and chemoarchitectonic organization . J Comp Neurol . 463 ( 1 ): 67 – 91 . Google Scholar Crossref Search ADS PubMed WorldCat Tsao DY , Freiwald WA, Knutsen TA, Mandeville JB, Tootell RB. 2003 . Faces and objects in macaque cerebral cortex . Nat Neurosci . 6 ( 9 ): 989 – 995 . Google Scholar Crossref Search ADS PubMed WorldCat Tulving E . 2002 . Episodic memory: from mind to brain . Annu Rev Psychol . 53 : 1 – 25 . Google Scholar Crossref Search ADS PubMed WorldCat Vaziri S , Carlson Eric T, Wang Z, Connor Charles E. 2014 . A channel for 3d environmental shape in anterior inferotemporal cortex . Neuron . 84 ( 1 ): 55 – 62 . Google Scholar Crossref Search ADS PubMed WorldCat Zhang B , Naya Y. 2020 . Medial prefrontal cortex represents the object-based cognitive map when remembering an egocentric target location . Cereb Cortex . bhaa117 . doi: 10.1093/cercor/bhaa117 . Google Scholar OpenURL Placeholder Text WorldCat Crossref Zhaoping L . 2019 . A new framework for understanding vision from the perspective of the primary visual cortex . Curr Opin Neurobiol . 58 : 1 – 10 . Google Scholar Crossref Search ADS PubMed WorldCat © The Author(s) 2020. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permission@oup.com. This article is published and distributed under the terms of the Oxford University Press, Standard Journals Publication Model (https://academic.oup.com/journals/pages/open_access/funder_policies/chorus/standard_publication_model) TI - Automatic Encoding of a View-Centered Background Image in the Macaque Temporal Lobe JF - Cerebral Cortex DO - 10.1093/cercor/bhaa183 DA - 2020-11-03 UR - https://www.deepdyve.com/lp/oxford-university-press/automatic-encoding-of-a-view-centered-background-image-in-the-macaque-HFLucRf3Fa SP - 6270 EP - 6283 VL - 30 IS - 12 DP - DeepDyve ER -