Access the full text.
Sign up today, get DeepDyve free for 14 days.
References for this paper are not available at this time. We will be adding them shortly, thank you for your patience.
Sensory processing in the brain includes three key operations: multisensory integration—the task of combining cues into a single estimate of a common underlying stimulus; coordinate transformations—the change of reference frame for a stimulus (e.g., retinotopic to body-centered) effected through knowledge about an intervening variable (e.g., gaze position); and the incorporation of prior information. Statistically optimal sensory processing requires that each of these operations maintains the correct posterior distribution over the stimulus. Elements of this optimality have been demonstrated in many behavioral contexts in humans and other animals, suggesting that the neural computations are indeed optimal. That the relationships between sensory modalities are complex and plastic further suggests that these computations are learned— but how? We provide a principled answer, by treating the acquisition of these mappings as a case of density estimation, a well-studied problem in machine learning and statistics, in which the distribution of observed data is modeled in terms of a set of fixed parameters and a set of latent variables. In our case, the observed data are unisensory-population activities, the fixed parameters are synaptic connections, and the latent variables are multisensory-population activities. In particular, we train a restricted Boltzmann machine with the biologically plausible contrastive-divergence rule to learn a range of neural computations not previously demonstrated under a single approach: optimal integration; encoding of priors; hierarchical integration of cues; learning when not to integrate; and coordinate transformation. The model makes testable predictions about the nature of multisensory representations. Citation: Makin JG, Fellows MR, Sabes PN (2013) Learning Multisensory Integration and Coordinate Transformation via Density Estimation. PLoS Comput Biol 9(4): e1003035. doi:10.1371/journal.pcbi.1003035 Editor: Gunnar Blohm, Queen’s University, Canada Received November 5, 2012; Accepted March 3, 2013; Published April 18, 2013 Copyright: 2013 Makin et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Funding: This work was supported by Reorganization and Plasticity to Accelerate Injury Recovery (REPAIR; N66001-10-C-2010, http://www.darpa.mil/) and NIH NEI (EY015679, http://www.nih.gov/). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. Competing Interests: The authors have declared that no competing interests exist. * E-mail: [email protected] . These authors contributed equally to this work. a variety of multisensory inputs [1,3–7]. Prism and virtual- Introduction feedback adapation experiments [8–12] have demonstrated the The brain often receives information about the same feature of plasticity of these multisensory mappings, and it is not likely the same object from multiple sources; e.g., in a visually guided limited to recalibration: Deprivation studies [13]; afferent re- reach, both vision and proprioception provide information about routing experiments [14,15]; the ability to learn novel, cross-modal hand location. Were both signals infinitely precise, one could mappings; and genetic-information constraints together suggest simply be ignored; but fidelity is limited by irrelevant inputs, that integration is learned, with the organization of association intrinsic neural noise, and the spatial precisions of the transducers, cortices driven by sensory data. so there are better and worse ways to use them. The best will not A plausible neural model of multisensory integration, then, must throw away any information—in Bayesian terms, the posterior learn without supervision how to combine optimally signals from probability over the stimulus given the activities of the integrating two or more input populations as well as a priori information, neurons will match the corresponding posterior given the input encoding both the most likely estimate and certainty about it— signals. Encoding in the integrating neurons the entire posterior even when the relationship between the signal spaces is nonlinear for each stimulus, and not merely the best point estimate, is crucial (like retinotopic and proprioceptive-encoded hand location), and because this distribution contains information about the confi- when their relationship is mediated by another variable (like gaze dence of the estimate, which is required for optimal computation angle). Existing computational models of multisensory integration with the stimulus estimate [1,2]. A sensible code will also or cross-modal transformation neglect one or more of these ‘‘compress’’ the information—for example, by representing it in desiderata (see Discussion). fewer neurons—otherwise the brain could simply propagate Here we show that the task of integration can be reformulated forward independent copies of each sensory signal. as latent-variable density estimation, a problem from statistics that can Psychophysical evidence suggests that animals—and therefore be implemented by a neural network, and the foregoing their brains—are indeed integrating multisensory inputs in such an requirements thereby satisfied. The goal is to learn a data ‘‘optimal’’ manner. Human subjects appear to choose actions distribution (here, the activities of populations of visual and based on the peak of the optimal posterior over the stimulus, given somatosensory neurons while they report hand location in their PLOS Computational Biology | www.ploscompbiol.org 1 April 2013 | Volume 9 | Issue 4 | e1003035 Multisensory Integration via Density Estimation Multisensory integration in the RBM Author Summary We begin by examining the ability of our model to perform Over the first few years of their lives, humans (and other optimal multisensory integration, in the sense just described. We animals) appear to learn how to combine signals from use our ‘‘standard’’ network, with a visible layer of 1,800 Poisson multiple sense modalities: when to ‘‘integrate’’ them into a units, comprising two 30630 input populations, and a hidden single percept, as with visual and proprioceptive informa- layer of half that number of Bernoulli units. We trained and tested tion about one’s body; when not to integrate them (e.g., this network on separate datasets, with stimuli chosen uniformly in when looking somewhere else); how they vary over longer the 2D space of joint angles (see Methods and Fig. 1B). time scales (e.g., where in physical space my hand tends to Decoding the posterior mean. We first show that the be); as well as more complicated manipulations, like hidden layer successfully encodes the optimal-posterior mean. For subtracting gaze angle from the visually-perceived posi- a fixed stimulus location, s, we compare the distribution of the tion of an object to compute the position of that object stimulus decoded from 15 samples of the hidden units, ^ss (v) RBM with respect to the head—i.e., ‘‘coordinate transforma- (‘‘RBM-based estimate’’, see Methods), with the distribution of tion.’’ Learning which sensory signals to integrate, or the optimal-posterior mean, ^ss (r). (The latter estimate also has MAP which to manipulate in other ways, does not appear to a distribution across trials, even for a fixed stimulus, because the require an additional supervisory signal; we learn to do so, input encodings are noisy.) We compare the distributions of these rather, based on structure in the sensory signals them- two estimates, rather than simply examining the distribution of selves. We present a biologically plausible artificial neural their difference, because the resulting figures (Fig. 2A) then network that learns all of the above in just this way, but by training it for a much more general statistical task: ‘‘density resemble those typically presented in psychophysical studies, estimation’’—essentially, learning to be able to reproduce where behavior plays the role of the estimate—and indeed, has the data on which it was trained. This also links coordinate been found to correspond to the optimal-posterior mean [1]. transformation and multisensory integration to other Fig. 2A shows the mean and covariance of the conditional cortical operations, especially in early sensory areas, that estimator distributions, p(^ssDs), for various stimulus locations s,and have have been modeled as density estimators. for four separate estimates of the posterior mean: the MAP estimate using the visual-population activities (magenta), the MAP estimate using the proprioceptive-population activities (orange), the MAP respective spaces) in terms of a set of parameters (synaptic estimate using both input populations (the ‘‘optimal’’ posterior mean, strengths) and a set of unobserved variables (downstream, black), and the estimate using the hidden-layer activities (‘‘RBM- integrating neurons). In particular, we model the cortical based integrated estimate,’’ green). Each ellipse depicts the 95% association area with a restricted Boltzmann machine (RBM), an confidence interval of the distribution’s covariance, centered at its undirected generative model trained with a fast, effective Hebbian mean, as in all subsequent figures. Clearly, the RBM-based estimate matches the MAP estimate over nearly all of the workspace. Visible learning rule, contrastive divergence [16,17]. By making the errors occur only at the edges of joint space, probably a result of the machine a good model of the distribution of the training data, ‘‘edge effects,’’ i.e., the proximity of extreme joint angles to regions of learning obliges the downstream units to represent their common space not covered by the (perforce finite) grid of neurons. underlying causes—here, hand location. The same formulation turns out to be equally suited to coordinate transformation as well. We can quantify the contribution of these imperfections to the total optimality of the model. Since the MAP estimate is the unique minimizer of the average (over all stimuli) mean square Results error, the marginal error distribution, p(^ss{s)~ p(s)p(^ss{sDs), A network that has learned to perform the integration task will summarizes all the conditional estimator distributions. These transmit to downstream neurons (v), on each trial, all the marginal error statistics (Fig. 2B stdmargstats) show that the information in its inputs (r) about the stimulus (s). In our case, overall performance of the network is very nearly optimal. that network is the RBM, the stimulus is the location of the hand, Decoding the posterior covariance. We next show that the and the inputs are two neural populations (visual and proprio- hidden layer also encodes the optimal-posterior covariance. The ceptive) encoding hand location in different spaces (Fig. 1A; see posterior covariance represents the uncertainty on a single trial also Methods). Equivalently, integration requires that the about the true stimulus location, given the specific spike counts on posterior distribution over the stimulus given the activities of the this trial. Since on a single trial, only one point from the posterior downstream (‘‘hidden’’ or ‘‘multisensory’’) units, q(sDv), match the distribution (presumably the mean) manifests itself in a behavior— posterior over the stimulus given the two inputs, p(sDr). Hence- e.g., a reach—, that trial’s posterior covariance cannot be read off forth, we call the latter of these distributions the optimal posterior, the behavior as the posterior mean can. Nevertheless, the posterior since it serves as the benchmark for performance. Having covariance has important behavioral consequences across trials: it arranged, by our choice of input-population encoding, for the determines the relative weighting of each input during optimal optimal posterior to be Gaussian (see Methods), its statistics integration (see Eq. 3b in Methods). This is clearly a requirement consist only of a mean and a covariance. Thus to show that the for the input populations, vis and prop; but if, for example, the network successfully integrates its inputs, we need show only that multisensory (hidden-unit) estimate, ^ss(v), is itself to be integrated these two cumulants can be recovered from the multisensory with yet another sensory population at a further stage of neurons—intuitively, that they have learned to encode the optimal processing, optimality of that integration requires knowledge of stimulus location and confidence in that location, respectively. We ^ the posterior covariance, in order to weight ss(v) properly. We emphasize that throwing away covariance (or other statistical show in Hierarchical networks below that the model can learn information) would render subsequent computations suboptimal: just such an architecture, demonstrating that posterior covariance for example, if the integrated estimate is itself to be integrated information is indeed encoded in the hidden units; but here we downstream with another modality, it must be weighted by its own exhibit the result more directly. precision, i.e. inverse covariance (see Text S1 and Hierarchical The posterior precision (inverse covariance) on each trial is a networks below). 2|2 symmetric matrix and therefore ostensibly has three degrees PLOS Computational Biology | www.ploscompbiol.org 2 April 2013 | Volume 9 | Issue 4 | e1003035 Multisensory Integration via Density Estimation Figure 1. Multisensory integration: data and model. (A) The model and example data. World-driven data are generated according to the directed graphical model boxed in the lower right: On each trial, a hand location s and the population gains g and g for the two sensory modalities x h are drawn from their respective prior distributions. Given these, a spike count is drawn for each neuron (magenta and orange colored circles) from a Poisson distribution (see Eq. 2), yielding (e.g.) the set of firing rates shown by the heat maps at left. The center of mass of each population is marked with an x. The visual (magenta) and proprioceptive (orange) neural populations each encode the location of the hand, but in different spaces: 2D Cartesian space and joint space, respectively, drawn in outline in the heat maps. Since the neurons’ preferred stimuli uniformly tile their respective spaces (indicated by the grids), but the forward kinematics relating these variables is nonlinear (inset; joint limits are indicated with red shading, joint origins with black lines), hand position is encoded differently by the two populations. These population codes also constitute the input layer, R, of the PLOS Computational Biology | www.ploscompbiol.org 3 April 2013 | Volume 9 | Issue 4 | e1003035 Multisensory Integration via Density Estimation RBM (lower right). Its hidden units, V, are Bernoulli conditioned on their inputs, corresponding to the presence or absence of a spike. The green heat map in the upper right depicts the mean of 15 samples from the hidden layer of a trained network for the example inputs shown at left. (B) Testing and training. In the first step of training (first panel), the external world elicits a vector of Poisson spikes from the input layer, driving recurrent activity in the neural network—up, down, and back up (second through fourth panels). The weights are then adapted according to the one-step contrastive- divergence rule. Testing also begins with a world-driven vector of Poisson spikes from the input populations, which drives 15 samples of hidden-layer activity (second panel). We then decode the input and hidden layers, yielding their respective posterior distributions. doi:10.1371/journal.pcbi.1003035.g001 of freedom. However, as shown below in Eq. 3a (Methods), the p(sDr) have identical mean and variance, but the latter is Gaussian encoding scheme constrains it to a lower-dimensional manifold: while the former is not, then the former has lower entropy—which the only quantities that change from trial to trial are the ‘‘total is impossible, since information about S cannot be gained in the P P x x h h transition from R to V.) The KL divergence between two spike counts,’’ r ~: g , r ~: g , and the location where i i i i Gaussian distributions has a very simple form, and in fact we make the Jacobian of the forward kinematics is evaluated. The latter is it simpler still by examining only that portion contributed by the given by the posterior mean, which we have just shown can be covariances—i.e., ignoring mean differences, since we have just reconstructed nearly optimally. Therefore, reconstruction of the examined them in the previous section: KLfN (m,S ),N (m,S )g posterior precision requires the additional recovery only of the 0 1 {1 {1 total spike counts of the respective input populations. ~(trace(S S ){log(DS S D){m)=2, where m is the number 0 0 1 1 Fig. 3A shows the coefficients of determination (R ) for two of dimensions. The first bar of Fig. 3B show this divergence from different estimators of the total spike counts, one using 15 samples the optimal posterior to the RBM-based posterior (again based on from the hidden-layer units (as for the posterior mean above), and 15 samples). the other using hidden-layer means (i.e., infinite samples; see What constitutes a proper point of comparison? Consider a Methods). In all cases, R values are greater than 0.82, with the fixed computation of the covariance which uses Eq. 3a but using x h infinite-sample decoder approaching 0.9. the average (across all trials) total spike counts, gg and gg ,rather How do these values translate into the quantity we really care than their trial-by-trial counterparts. If the model had learned about, the posterior covariance, and by implication the posterior the prior distribution over the total spike counts, but was not distribution itself? To quantify this, we employ the standard actually encoding any trial-by-trial information, it could do no measure of similarity for distributions, the KL divergence. Since better than this fixed computation. The KL divergence of the the true posterior is Gaussian, and since the RBM encodes the optimal posterior from this fixed computation is shown in the (nearly) correct mean and variance of q(sDv), it too must be (nearly) second bar of Fig. 3B. The model is clearly far superior, Gaussian. (Given a specified mean and finite variance, the demonstrating that it is indeed transmitting trial-by-trial maximum-entropy distribution is normal. Thus if q(sDv) and covariance information. Figure 2. Recovery of the posterior mean. The four ellipses in each plot correspond to the covariances of four different estimates of the stimulus: the MAP estimate of the stimulus using only the visual input population (magenta), the MAP estimate using the proprioceptive input population (orange), the MAP estimate using both populations (i.e., the true posterior mean, which is the optimal estimate; black), and the estimate based on decoding the hidden layer (‘‘RBM-based estimate’’; green). (The color conventions are the same throughout the paper.) Each ellipse bounds the 95% confidence interval and is centered at its mean. All results are shown in the space of joint angles in units of radians. (A) Conditional errors. The middle plot shows the conditional errors for a grid of stimulus locations (each centered at the true stimulus); four examples are enlarged for clarity. Note that nontrivial biases arise only at the edges of the workspace. (B) Marginal error statistics. The RBM-based error (green) is unbiased and its covariance closely matches the optimal covariance (black). doi:10.1371/journal.pcbi.1003035.g002 PLOS Computational Biology | www.ploscompbiol.org 4 April 2013 | Volume 9 | Issue 4 | e1003035 Multisensory Integration via Density Estimation h x Figure 3. Recovery of the posterior distribution. (A) Reconstruction of the input total spike counts, g and g , for VIS and PROP, resp., from 15 samples of the hidden units (‘‘samples’’), and from infinite samples of the hidden units (‘‘means’’). Decoding these, along with the posterior mean (demonstrated in Fig. 2), is sufficient to recover to posterior covariance. (B) Average (across all trials) KL divergences for two distributions from the h x x h optimal posterior, p(sDr ,r ): (black) the posterior over s given the mean (across trials) total spike counts (gg and gg ) and the optimal posterior mean, ^ss(r) :~E½SDr; and (green) the sample-based model-posterior, given also the optimal posterior mean. The mean-based model posterior, not shown, is visually indistinguishable. This measures purely the divergence resulting from failure to pass covariance information on to the hidden units. That the RBM-based posterior is so much smaller demonstrates that the model is not merely passing on mean spike-count information, but their trial-by-trial fluctuations. (C) Percent of total information lost from input to hidden units (measured by normalized KL divergence between the optimal and RBM- based posteriors; see Text S1), as a function of gains. Information loss is less than about 1.2% for all gains. (D) Posterior distributions (means and covariances) from three randomly selected trials. Color scheme is as throughout; dashed green shows the posterior computed from hidden-unit means (vv), as opposed to samples (v, solid green). doi:10.1371/journal.pcbi.1003035.g003 fractional information lost for fixed Finally, we directly demonstrate the fidelity of the entire model posterior, q(sDv), to the entire optimal posterior, p(sDr),asa SKLfp(sjr)jjq(sjv)gT ð1Þ q(vjr)p(rjg ,g ) function of the population gains, by calculating the fractional 1 2 g , g ~ : 1 2 information lost in terms of the normalized KL divergence: SKLfp(sjr)jjp(s)gT p(rjg ,g ) 1 2 PLOS Computational Biology | www.ploscompbiol.org 5 April 2013 | Volume 9 | Issue 4 | e1003035 Multisensory Integration via Density Estimation This quantity is 0 in the best case, when q(sDv)~p(sDr), and 1 in at N hidden units (cyan line), which is the number of units in one the worst, when q(sDv)~p(s). (See also Text S1 for a more input population—increasing the hidden layer beyond that has no extended discussion.) Fig. 3C shows that slightly more information effect on performance. Performance also asymptotes at around ten samples per unit. At asymptote, the errors are close to optimal is lost at low visual gains, but that in fact the slope is very shallow, since all information losses are between the small amounts of 0:9% (solid black line), and much better than the single-input (PROP) error (dashed black line). (The VIS determinant is much larger and and 1:2%. To visualize this small discrepancy, Fig. 3D provides a qualitative comparison of the single-population, dual-population therefore omitted.). (optimal), and RBM-based posterior distributions, for three Fig. 4 also shows the error for a network with 5N hidden units random trials. (These are not to be confused with the distribution and the use of means (equivalent to taking infinite samples) in the of the posterior mean, as in Fig. 2A, which is measured across hidden layer (dotted black line). This error lies about halfway trials.) The match between model and optimal posterior is evident between the optimal and asymptotic RBM-based errors, showing that about half that network’s suboptimality is due to noise, and for both covariance (size and shape of the ellipse) and mean (its half due to network architecture and the learning algorithm; but in location). any case the network performance is quite close to optimal. Effects of hidden-layer size and hidden-layer noise. Figs. 2 and 3 have shown model performance to be ‘‘nearly’’ optimal, in that both the posterior mean and the posterior covariance are Simulating psychophysical studies of optimal integration encoded in the hidden layer. The small deviations from optimality We now relate our model to some familiar results from can result from two distinct causes: (1) the network having failed to psychophysical investigations of multisensory integration. In the learn the ideal information-preserving transformation, and (2) the foregoing simulations, the input populations were driven by the noise in the hidden layer having corrupted that transformation. In same stimulus. The most common experimental probe of integra- order to gauge the relative contribution of the two, we re-tested the tion, however, is to examine the effects of a small, fixed discrepancy model under a range of different capacities and noise levels by between two modalities—with, e.g., prism goggles or virtual varying the number of hidden units and the number of samples feedback [1,3,4,18–20]. Integrated estimates tend to fall between taken at the hidden layer, respectively. Note that since the hidden the means of the discrepant inputs, revealing the relative weighting units are Bernoulli, increasing the number of samples is akin to of the two modalities. The mean location of the integrated estimate increasing the time window over which mean rates of activity are therefore allows experimenters to assess integration without having computed. Our assay is the error in the RBM-based estimate of the to obtain reliable estimates of the error covariance. Notice this point posterior mean; and since we observe that only the size, rather than will not necessarily lie along the straight line connecting the input the shape or position, of the error-covariance ellipse is greatly means, since the sensory covariances need not be aligned [1]. distorted as a function of decreasing samples, for simplicity we plot To replicate these experiments, the trained network from Fig. 2 only the determinant of the error-covariance matrix. was tested on sets of ‘‘shifted’’ data in which joint angles had been Fig. 4 shows that, as expected, the error measure decreases displaced from their corresponding visual locations by a fixed quantity, both with more hidden units and more samples. However, a the ‘‘input discrepancy,’’ before being encoded in the prop population. comparison of the different curves shows that the error asymptotes To determine how large to make this discrepancy, we returned to the Figure 4. Dependence of error covariance on numbers of samples and hidden units. Networks with different numbers of hidden units (see legend; N~ number of units in a single input population) were trained on the input data, and then decoded for the posterior mean in the usual way but using different numbers of samples from the hidden layer (abscissa) before averaging. The determinants of the resulting error covariances are plotted here with colored lines. Dashed line, MAP error covariance using only proprioceptive input; solid line, optimal error covariance; dotted line, error covariance from the 5N network when using means in the hidden layer—i.e., infinite samples—the asymptote of the colored lines. doi:10.1371/journal.pcbi.1003035.g004 PLOS Computational Biology | www.ploscompbiol.org 6 April 2013 | Volume 9 | Issue 4 | e1003035 Multisensory Integration via Density Estimation original, unshifted data. Although the average discrepancy between the in the middle of joint space (see The optimal posterior two inputs in this data set is zero (as seen in the locations of the distribution over the stimulus in Methods). Thus, for a magenta and orange ellipses in Fig. 2), the noisy encoding fixed stimulus, the (conditional) optimal estimator will be biased renders the discrepancy on single trials non-zero, with the pro- toward the center of joint space relative to that stimulus. Averaged bability of finding such a discrepancy determined by the sum of over all stimuli, the (marginal) optimal estimator will be centrally the input covariances, (S zS )~: S . This quantity provid- located, but have smaller error covariance than its flat-prior x h IN ing, then, a natural measure of discrepancy, each set of shifted counterpart—intuitively, the prior information increases the data was created with an input discrepancy of K standard precision of the estimator. deviations of S , with K[f2:5, 5, 7:5, 10, 12:5, 15g. Note that This is precisely what we see for the RBM-based estimate in IN large K enables a further investigation—into the generalization of Fig. 6A,B. Its conditional statistics are shown for six different fixed the trained network: The extent to which the RBM’s optimality is stimuli in Fig. 6A, along with those of the two unisensory MAP maintained as the input discrepancy grows indicates, qualitatively, estimates and the optimal estimate (the MAP estimate given both the generalization powers of the machine on these data. input populations). The corresponding marginal error statistics, Fig. 5A shows the error statistics for these testing datasets for averaged over all stimuli under their prior distribution, are shown several discrepancy magnitudes along a single direction (discrep- in Fig. 6B The RBM-based error covariance, like its optimal ancies along other directions, not shown, were qualitatively counterpart, is tighter than that achieved with a flat prior (cf. similar). Psychophysicists examine conditional errors, but again Fig. 2B). for generality we have averaged across stimulus locations to Sometimes-decoupled inputs. We have been supposing the produce marginal errors. The RBM-based estimator (green) model to correspond to a multisensory area that combines becomes noticeably suboptimal by 7.5 standard deviations. proprioception of the (say) right hand with vision. When not Furthermore, the distribution of errors becomes distinctly non- looking at the right hand, then, the populations ought to be normal for large input discrepancies, spreading instead over the independent; and an appropriate model should be able to learn arc connecting the centers of the input error distributions. This arc this even more complicated dataset, in which the two populations corresponds to the location of the optimal estimate for varying have a common source on only some subset of the total set of relative sizes of the input error covariances [1]. Whether such a examples. This is another well known problem in psychophysics pattern of errors is exhibited by human or animal subjects is an and computational neuroscience (see e.g. [21]). When the interesting open question. integrating area receives no explicit signal as to whether or not Another way of measuring machine generalization is to test its the populations are coupled, the posterior distribution over the performance under gain regimes outside its testing data. Since no right hand is a mixture of Gaussians, which therefore requires the discrepancy is enforced between the modalities, biases should be encoding of numerous parameters—two means, two covariance zero. Performance should be approximately optimal in the matrices, and a mixing proportion—and is therefore rather training regime, where gains ranged from 12 to 18 spikes. And complicated to decode. Simulations, omitted here, show that the indeed, Fig. 5B shows that neither the error covariance (the RBM does indeed learn to encode at least the peaks of the two relative shapes of the green and black ellipses) nor the bias (the Gaussians. relative positions of the green and black ellipses) are noticeably A slightly simpler model includes among the input data an worse than in the training regime until the gain ratios (PROP/ explicit signal as to whether the input populations are coupled, in VIS) reach the extreme values on the plot. our case by dedicating one neuron to reporting it. This model is Finally, we examine machine performance under both input shown in Fig. 6C: populations were coupled in only 70% of trials; discrepancy and gain modulation, with a constant input discrep- in the others, the vis (magenta) population reports the ‘‘left hand,’’ ancy of 2.5 standard deviations and various gain ratios Fig. 5C. and the unit labelled T indicates this by firing at its maximum The black and green dotted lines, nearly identical, track the mean spike count (otherwise it is off). Derivation of the optimal movement of the error means of the optimal and RBM-based error covariance for the MAP estimate is given in Text S1; estimators, respectively. This reproduces the familiar psychophys- intuitively, the model must learn to encode different distributions ical finding that varying the relative reliability of two discrepant in its hidden units depending on whether or not T is on. When T inputs will bias downstream activity (sc., behavior) toward the is off, these units should integrate the stimulus estimates encoded more reliable modality’s estimate [1]. by the two populations and encode this integrated estimate (and its variance). When T is on, it should encode the proprioceptive Different training data stimulus and the visual stimulus separately. The optimal error Learning non-flat priors. So far we trained on stimuli that variance is calculated by a weighted average of the error variances were chosen uniformly in joint space, so that the posterior mean is in the two conditions, smaller and larger respectively, the weights h x simply the peak of the likelihood given the inputs, p(r ,r Dh).In being the percentage of the time each conditions occurs (0:7 and general, of course, these quantities are distinct. Since the learning 0:3, resp.). (The optimal error mean is still zero.) Fig. 6D shows algorithm we employ is a density estimation algorithm, it is that a network trained on these data—with the same architecture h x expected to reproduce the marginal density p(r ,r )~ p(h)p as throughout—again achieves this optimum. h x (r ,r Dh)dh, and thus should learn the prior over the stimulus as well as the likelihood. Therefore, the distribution of hidden-layer Other architectures activities in the trained model will reflect both of these ‘‘input Hierarchical networks. A plausible neural model of multi- distributions,’’ and we should be able to decode the maximum a sensory integration will be composable in the sense that the posteriori (MAP) estimate from the RBM. Importantly, we use the integrating neurons can themselves serve as an input population same decoding scheme employed as throughout (see Methods), for further integration with, e.g., another modality. Fig. 7A ensuring that the prior is instantiated in the RBM rather than the illustrates the architecture of one such network. As above, input decoder. layers are Poisson, hidden layers are Bernoulli. The first RBM is For simplicity, we chose the prior p(h) to be a tight Gaussian— the same as in the foregoing results; the second was trained on an with covariance on the order of the input covariances—centered input layer comprising the hidden-layer population of the first PLOS Computational Biology | www.ploscompbiol.org 7 April 2013 | Volume 9 | Issue 4 | e1003035 Multisensory Integration via Density Estimation Figure 5. Model generalization across input discrepancies and input gains. After training, the model was tested on data that differ from its training distribution. (A) Discrepant-input data: PROP input (orange) is shifted by progressively greater increments of the input covariance (see text), leading to suboptimal integration, as expected, and structured error distributions. The hidden-layer error mean, like the optimal error mean, shifts rightward with the PROP ‘‘bias.’’ (B) Gain-modulated data: The training data had gains between 12 and 18. Testing on gains (ratios listed between panels (B) and (C)) outside this regime yields suboptimal error covariances but essentially zero biases. (C) Gain-modulated, input-discrepant data: As the relative reliability of PROP is increased, the optimal estimate shifts toward PROP and away from VIS. The green and black dotted lines, nearly identical, trace this movement for the machine-based and optimal estimates, resp. For larger discrepancies (not shown), this optimal behavior breaks down, the green and black lines diverging. doi:10.1371/journal.pcbi.1003035.g005 RBM (‘‘integrated representation 1’’) and a new input population Again we focus on the error statistics of the posterior mean (‘‘PROP 2’’), which for simplicity encodes joint angles, just as the (Fig. 7B). Both integrated representation 1 (using two inputs) and first-layer proprioceptive population (‘‘PROP 1’’) does—though of integrated representation 2 (using all three inputs) approach their course the population activities are different, since these are noisy. optimal values. Although these error statistics are direct measures The second population also has a different gain on each trial (see of posterior-mean encoding only, that the posterior variance is the bottom panel of Fig. 7A). being encoded is demonstrated indirectly, as well: Proper PLOS Computational Biology | www.ploscompbiol.org 8 April 2013 | Volume 9 | Issue 4 | e1003035 Multisensory Integration via Density Estimation Figure 6. Other data distributions. (A,B) : Learning a prior. The network was trained on population codes of an underlying stimulus that was drawn from a Gaussian (rather than uniform, as in the previous figures) prior. This makes the MAP estimate tighter (cf. the black ellipses here and in Fig. 2B) —and indeed the RBM-based estimate’s error covariance is correspondingly tighter. (A) Conditional estimate statistics (color scheme as throughout): The output estimates (green) have smaller covariances, but they, like the optimal estimates (black) are also biased toward the mean of the prior, located at the center of the workspace. The match between them is evidently very good. Note that the stimulus location for each of these conditional statistics is eight standard deviations from the mean of the prior—so the model has generalized well to points that constituted a trivial fraction of the training data. (B) Marginal error statistics. (C,D): Learning that the inputs need not report the same stimulus. (C) A graphical model showing the independence relationships holding among the variables of this model. The (observed) toggle T determines whether the visual population is reporting the left (S ) or right (S ) hand. (D) Marginal error statistics (colors as throughout) for the mean of the posterior distribution L R over the right hand. Since the visual population provides information about the right hand only 70% of the time, the optimal error covariance is broader than its counterpart in Fig. 2B. The RBM-based estimate again nearly matches it. doi:10.1371/journal.pcbi.1003035.g006 integration at the second level requires variance information to be depends only upon gaze position, X~F(H){E, and the ‘‘stimuli’’ consist of two independent random variables H and E [23]. Fig. 7C encoded in the first hidden layer. The (nearly) optimal error statistics for the second layer show that indeed the posterior depicts a probabilistic graphical model for this scenario, along with the RBM that is to learn these data (cf. Fig. 1A). The optimality variance information is encoded on a per-trial basis in the (first) hidden layer. equations are slightly more complicated for this problem (see Coordinate Transformations in Text S1), but conceptually Coordinate transformation. We consider now another, seemingly different, computational problem studied in the sensori- similar to that of simple multisensory integration (Eq. 3). motor literature, coordinate transformation (sometimes called In this model, the proprioceptive population is responsible for a ‘‘sensory combination’’ [22]). In general, the relationship between larger space than either of the other two variables, a consequence of proprioception and visually-encoded stimuli is mediated by other our choice to sample in the space of the latter (see Fig. S2A and quantities—gaze angle, head-to-body orientation, body-to-arm related discussion in Tuning of the coordinate-transforming orientation, etc. —which are themselves random variables. In the neurons in Text S1). Allocating to each population the same simplest version, the relationship of vision to proprioception number of neurons, while also demanding that the H variance be PLOS Computational Biology | www.ploscompbiol.org 9 April 2013 | Volume 9 | Issue 4 | e1003035 Multisensory Integration via Density Estimation Figure 7. Other architectures. (A,B) A ‘‘hierarchical’’ network, in which a third modality must be integrated with the integrated estimate from the first stage—which is just the original model. (A) Data generation (bottom), population coding (middle), and network architecture (cf. Fig. 1A). Input units are Poisson and hidden units (green) are Bernoulli. The population codes, depicted in one dimension for simplicity, are actually 2D. Each hidden layer has half (~900) the number of units in its input layer (~1800). (B) Marginal error statistics. The error ellipses for PROP 1 (orange), for VIS (magenta), for both PROP 1 and VIS (dashed black), and for ‘‘integrated representation 1’’ (dashed green) replicate the results from Fig. 2B. PROP 2is encoded in the same way as PROP 1 (though their activities on a given trial are never equal because of the Poisson noise), and so has identical error statistics (orange). Conditioning on this third population in addition to the other two shrinks the optimal error covariance (solid black), and the estimate decoded from ‘‘integrated representation 2’’ (solid green) is correspondingly smaller as well, and again nearly optimal. (C,D) Coordinate transformation. (C) Data generation (bottom), population coding (middle), and network architecture (top). Each input population (bottom panel, color coded) depends on its own gain; whereas, both PROP (h, orange) and VIS (x, magenta) depend on the stimulus (hand position), and both VIS and EYE (e, blue) depend on gaze angle. (D) Mean square errors. The RBM-based estimates have nearly minimal MSEs, demonstrating that these estimates are nearly equal to the mean of the true posterior distribution. Inset: the physical setup corresponding to coordinate transformation. Red shading denotes joint limits; the black line denotes the origin of joint space. doi:10.1371/journal.pcbi.1003035.g007 small enough for its contribution to affect appreciably the integrated the three ‘‘stimuli,’’ X, H,and E, plus their three associated gains— estimate, requires that we increase its relative gain; hence we let whereas the hidden layer needs to encode five, one of the stimuli g ~5, g ~15, g ~5. In keeping with the simple relationship just being redundant with the other two. And indeed, using fewer than x h e given, all variables are one-dimensional; the network allocates 160 hidden units yields suboptimal results. Cf. the ‘‘standard’’ 60 units to each, yielding 180 total input units. The hidden layer has network, for which the input encodes six variables—the two gains only 160, respecting our requirement that it be smaller than the and the two 2D stimuli—, and the hidden layer encodes four—two input layer. (The ratio of hidden/input was chosen with the gains and a single 2D stimulus. A longer discussion of these following rationale: The input layer encodes six random variables— approximate calculations can be found in Text S1.) Fig. 7D shows PLOS Computational Biology | www.ploscompbiol.org 10 April 2013 | Volume 9 | Issue 4 | e1003035 Multisensory Integration via Density Estimation that mean square errors (MSEs) of the RBM-based estimate of the input layer, where increase in gain increases the number of spiking stimulus are, once again, nearly optimal given the three inputs. (We units. Sharpening is also in contrast to the theoretical predictions can show mean and variance together as MSE without loss of of [2], where the hidden layer is a probabilistic population code of generality because the posterior mean is the unique minimizer of the the same form as the inputs, with both having the property that MSE, so showing that the RBM-based estimator achieves minimum higher gains imply a greater number of active neurons. This MSE shows that it is the posterior mean.) This demonstrates the feature has not been investigated directly in multisensory areas of generality of our approach, as the same network and algorithm will the cortex, and presents a useful test for the model. Although the learn to perform multisensory integration or coordinate transfor- absence of sharpening would not rule out a broader class of density mations, depending simply on its inputs (cf. the networks of [24–27], estimation models, it would indeed rule out this particular which are built specifically to perform coordinate transformations). implementation. Nor is there reason to believe that learnable transformations are Coordinate-transforming neurons. Investigation of multi- limited to simple combinations of the form X~F(H){E, which sensory tuning properties has a longer history for coordinate was chosen here merely to simplify our own computations of the transformations. Here, especially in Area 5d, MIP, and VIP, optimality conditions (see Coordinate Transformations in neurons have been reported to encode objects in references frames Text S1). intermediate between eye and body (‘‘partially shifting receptive fields’’) —i.e., the receptive field moves with the eye, but not Properties of the hidden units and their biological completely; and eye position modulates the amplitude of the tuning curve (‘‘gain fields’’) [28–31]. As in those studies, we find implications examples of retinotopic, body-centered, and partially shifting We now examine some of the properties of the hidden units, receptive fields—even fields that shift opposite to the change in especially those that electrophysiologists have focused on in multi- gaze-angle. Fig. 8B shows examples of all four types (see legend). sensory neurons in rhesus macaques. (We conflate head-centered and body-centered tuning in what Integrating neurons. Fig. 8A shows tuning curves for a follows, since we assume a head-fixed model.). random subset of 16 tuned hidden units in our ‘‘standard’’ More recently, Andersen and colleagues [31,32] have proposed multisensory-integration network (Multisensory integration in to analyze these qualitative descriptions in terms of (1) the the RBM). (By ‘‘tuned’’ we mean neurons whose mean firing ‘‘separability’’ of the tuning curve—whether it can be written rate—i.e., probability of firing—varied by more than 0.1 over the f (s,e)~f (s)f (e); and (2) the reference frame in which the neuron stimulus range.) To render tuning more clearly, curves were s e is tuned—body, eye, or something intermediate—as measured by computed noiselessly—using means in both the input and hidden the gradient of the tuning in the (s,e) space, since the direction of layers—and with a fixed gain of 15 for both populations. steepest change indicates the strongest tuning. All and only the Interestingly, the two-dimensional tuning for joint angles (left neurons with pure gain fields (no shift) will be separable. The column) is multimodal for many cells—although also highly extent of receptive-field shift for inseparable neurons is indicated structured, as apparent from comparison of tuning for the trained by the gradient analysis. (upper row) and untrained (lower row) networks. Although We reproduce that analysis on our model here. In [31], there is multimodal tuning has been found in multisensory areas, for a third variable in addition to hand location and gaze position, example, area VIP (see Fig. 3 of [28]), a comparison of these plots namely target. However, direct comparison between model and with empirical data is complicated by the fact that neurophysiol- data can be made simply by identifying the hand and target. ogists typically do not collect tuning data over a complete planar Finally, since all tuning curves were measured in visual space, we workspace. do the same; thus we define: T :~X, the retinotopic hand/ We therefore restrict attention to the 1D submanifold of joint ret target location in visual space; and T :~L cos(H), the body- space indicated by the black slash through the 2D tuning curves, body centered hand/target in visual space; giving the familiar equation corresponding to an arc in the visual space, since tuning over this range was reported in [29] (see especially the supplement) for T ~T {E. Fig. 8C shows the resulting histogram of gradient ret body multisensory neurons in Area 5 and MIP; we show the directions, which is qualitatively quite similar to its counterpart, corresponding model tuning in the right column for the same the top panel of Figure 4 of [31]: a peak at T , minor peaks at body sixteen neurons as the left column. The determination of whether the other ‘‘unmixed’’ stimuli, with representative neurons at all or not model neurons are tuned was made along this arc (rather stimuli combinations—except those intermediate between than the entire planar workspace); in this limited range, 137 of the T zE and E, where there is a gap in the histogram. body 900 hidden units were tuned. Results are qualitatively similar Nevertheless, we emphasize that correspondence between between data and model: Along the arc, units in the trained model and data in Fig. 8C should be interpreted with extreme network are unimodal and occasionally monotonic (unlike in the caution: it is possible to obtain different distributions of receptive- untrained model, bottom right). Furthermore, although none of field properties with our model; see Text S1 : Tuning of the these 16 cells exhibited bimodal tuning for this arc, from the coordinate-transforming neurons for further discussion. distribution of planar tuning we expect that some cells would; and indeed a subset of cells in [29] exhibit bimodal tuning (see Fig. 5 Discussion and Supplemental Fig. 6C in the cited work). We have demonstrated a neural-network model of multisensory Fig. 8A also shows how the tuning along the 1D arc depends on integration that achieves a number of desirable objectives that have the input gains. Although broadly similar across gains, increasing not been captured before in a single model: learning de novo to gain does result in a subtle sharpening of the tuning curves. This integrate the mean and covariance of representations of nonlinearly can be quantified more directly by simply counting the number of related inputs; learning prior distributions over the encoded stimuli; active neurons for a given stimulus under different gains: sharper staged, hierarchical integration; and ‘‘coordinate transformations.’’ tuning curves will result in fewer neurons firing (though possibly more total spikes). And indeed, after sampling 15 spikes from the Our approach is based on two central ideas. The first, following hidden layer, the percent of neurons firing is 22:5, 21:2, and 20:3, [2], is that the goal of multisensory integration is not (merely) to for g ~g ~12, 15, 18, respectively. This is in contrast to the encode in the multisensory neurons (v) an optimal point estimate x h PLOS Computational Biology | www.ploscompbiol.org 11 April 2013 | Volume 9 | Issue 4 | e1003035 Multisensory Integration via Density Estimation Figure 8. Tuning curves in the hidden layer. (A) Tuning curves for the multisensory-integration model/data (Figs. 1 and 2). The left column shows tuning curves in the space of joint angles for sixteen randomly chosen hidden units; the right column shows those same units for the arc of reach endpoints from [29]. The top row shows tuning curves for the trained model; the second row shows the same curves for the untrained model. The location of the arc in joint space is shown by the black slash through the tuning curves in the left column. Whereas the left-column tuning curves were collected for a single gain (g~15), the right-column curves were collected for g~12 (blue), g~15 (green), and g~18 (red) (the same gain was used for both populations, VIS and PROP). (B) Example hidden-unit tuning curves from the coordinate transformation model for body-centered hand position (T :~L cos(H); see text for details), for two different gaze positions (red and green curves). The dashed blue curves show where the red body tuning curves would lie for the second gaze position if they shifted completely with the eyes, as illustrated by the red arrows, i.e. if they were retinotopic. Some cells (second column) are; some are body-centered (first column); some partially shift (third column); and some even shift in the opposite direction of the gaze angle. (C) Coordinate-transforming cells can be tuned for any of the variables on the continuum from gaze angle (E), to retinotopic hand position (T :~X~T {E), to body-centered hand position (T ), to body-centered hand position plus gaze angle ret body body (T zE). The histogram shows the distributions of such tunings in the hidden layer, using the analysis of [31]. body doi:10.1371/journal.pcbi.1003035.g008 of the stimulus (s) given the activities of the input populations multisensory neurons v. Behavior itself corresponds to a single point h x (r~½r ,r ); but to encode an entire (optimal) distribution, so that from this distribution, but the higher cumulants will be necessary for q(sDv)~p(sDr). This criterion is equivalent to demanding that all the intervening computations: for example, the variance of the inte- information in the input populations about the stimulus—the mean, grated estimate determines how to integrate it optimally with variance, and higher cumulants, if applicable—be transferred to the other estimates downstream (see Fig. 7). PLOS Computational Biology | www.ploscompbiol.org 12 April 2013 | Volume 9 | Issue 4 | e1003035 Multisensory Integration via Density Estimation The second central idea is that this information-retention variance, Var½SDr ,r , but the density-estimation approach 1 2 criterion will be satisfied by the hidden or ‘‘latent’’ variables, V,of predicts that the hidden-unit activities on a given trial will a generative model that has learned how to produce samples from nevertheless encode both of that trial’s input-population gains— the distribution of its input data, R, a process called latent-variable and indeed they do in our model, albeit imperfectly (Fig. 3A). density estimation. The intuition connecting this learning problem Testing these predictions experimentally would be straightfor- with the seemingly very different task of multisensory integration is ward—try to decode unisensory covariances from a multisensory that being able to reproduce the input data (up to the noise) population—but it has never been done. requires encoding their ‘‘hidden causes’’—the features, like hand The question of whether cortical circuits learn to encode any location, that vary across trials, and thus should be transmitted posterior covariance information at all, as opposed to merely the downstream—in the latent-variable activities. The density estima- point estimate that psychophysical experiments elicit, is itself a tor will likewise learn to represent the statistical features that do crucial, open one. Of course, in theory one can always compute a not vary across trials, like prior information, in its weights. Since a posterior over the stimulus given some population activities [48]; network that has learned to reproduce its inputs efficiently will but whether the posterior conditioned on activities deep in the have implicitly learned the underlying relationship between their hierarchy matches that conditioned on the activity in early sensory hidden causes, density estimation also naturally solves other cortices, as in our model, is unknown. Our model also predicts that computational problems that arise in multisensory processing: the such constancy would emerge during learning—which could be need to perform coordinate transformations (Fig. 7C), for tested, for instance, by training an animal on a novel multisensory example, arises because a signal is available that correlates with pairing (e.g., audition and touch). a transformed version of other variables—like retinotopic object That fewer units are used to represent the same information location with the combination of body-centered object location (half as many in our simple integration model; see Multisensory and gaze angle. Efficiently encoding the distribution of the larger integration in the RBM), and that the maximum spike count of set of variables requires learning the coordinate transformation. each hidden neuron is bounded by the maximum mean spike With the network implementation of latent-variable density count of the inputs, constrains the amount of information that can estimation, we have demonstrated how all three of these learning be transmitted. This forces the hidden units to represent the problems—optimal integration, the integration of prior informa- information more efficiently—i.e., to ‘‘integrate’’ it. In fact, tion, and coordinate transformations—can be solved by multisen- without that constraint, no learning would be required to satisfy sory neural circuits. We have previously argued that these three the information-retention criterion: A random N|N weight operations are exactly those required for planning multisensory- matrix has rank N almost surely, and the neuron nonlinearities are guided reaching movements [23]. There is considerable evidence likewise invertible, so any random set of synaptic connections for multimodal, reaching-related signals across several brain areas would suffice (since any invertible transformation is information- in the posterior parietal cortex, including Area 5d, MIP, VIP, V6, preserving). We chose to constrain the multisensory representa- and Area 7 [33–38]. We propose that density estimation, driven by tional capacity, so that the synaptic connections form an N=2|N latent-variable learning, is the principle underlying computation matrix, which will not in general preserve stimulus information. performed by these areas. The fact that our network can be One promising theoretical strategy would be to take ‘‘passing on hierarchically composed is central to this hypothesis: these brain all the information’’ as a given, and then to seek the set of areas receive overlapping but distinct sets of inputs and with a constraints—fewest spikes [49], topography [50], fewest neurons, rough hierarchical organization within them [39–43]. Density least processing time, computational efficiency [51], etc. —that estimation on these inputs, then, is expected to yield activity yields the most biologically realistic activity patterns in the patterns that are also highly overlapping but distinct, as observed, multisensory units. for example, in [29,44]. We have previously argued that having a collection of such representations allows for the flexible and Relationship to other work (nearly) optimal use of a wide range of sensory inputs [45]. Multisensory integration was first considered from the stand- point of information theory and unsupervised learning in [52], and Implications of the model in a related work [50], and our approach is similar in spirit, but One example of a statistical feature that is constant across trials with important differences. Crucially, a different objective function is the prior distribution of the stimulus, which the network was minimized: integration was achieved by maximizing mutual therefore learns to encode in its weights. Whether prior distribu- information between the hidden/output units of two neural tions in the brain are encoded in synaptic weights [46,47], as a networks, each representing a modality, forcing these units to separate neural population [2], or something else again, remains represent common information, the latter additionally constrain- an area of active research (see also Text S1). ing topography. In our model, contrariwise, integration is enforced An interesting consequence of the present formulation is that it indirectly, by requiring a reduced number of (hidden) units to renders the gains random variables (see e.g. Fig. 1A), no less than represent the information in two populations. This allows for the stimulus location; that is, they represent information that is not greater generality since it does not require foreknowledge of which constant across trials. This has testable implications for multisen- populations should be forced to share information: if the infor- sory populations. For an M-dimensional stimulus, the posterior mation in the input populations is redundant, it will be ‘‘integrated’’ precision (inverse covariance) of the multisensory neurons is an in the hidden units, and conversely. More recently, the idea of M|M symmetric matrix and therefore has M(Mz1)=2 treating multisensory integration as a density estimation problem independent entries. But if the precisions of the two input has been proposed independently by [53], a complementary populations are each functions only of a single parameter (their report that explores both cognitive and neural implications of respective gains, reflecting the confidence in each modality), then this view, without proposing an explicit neural implementation. the multisensory activities need only encode two, rather than As in [50,52], then, no attempt is made to employ biological M(Mz1)=2, numbers on each trial. Conversely, in the case of a learning rules. Most significantly, none of these models invokes one-dimensional stimulus, a population of multisensory neurons the criterion for optimal integration that we have argued to be ostensibly need only encode the single value of the posterior central—the correct posterior distribution over the stimulus PLOS Computational Biology | www.ploscompbiol.org 13 April 2013 | Volume 9 | Issue 4 | e1003035 Multisensory Integration via Density Estimation given hidden-unit activities (q(sDv)~p(sDr), in the notation of this as illustrated in Fig. 1A. Each gain, g , can be thought of as the paper). This approach renders the combination of three signals of two confidence in its respective modality, since the posterior covari- independent causes—coordinate transformation—a matter simply of ance of a single, sufficiently large population, Cov½SDr , is inver- allowing another population to feed the hidden units; whereas the sely proportional to its gain [2]. The tuning curves f of each population are two-dimensional, isotropic, unnormalized Gaus- other models would require something more sophisticated. sians, whose width (variance) is S , and whose centers form a More recent models of multisensory integration or cross-modal t transformation neglect some combination of the desiderata listed regular grid over their respective spaces. To avoid clipping effects at the edges, the space spanned by this in the introduction. Basis-function networks with attractor grid of N|N neurons is larger than the joint space (or, for VIS, dynamics [27,30,54] ignore prior distributions but more signifi- cantly require hand-wiring (no learning). The models of [46] and than the reachable workspace). Thus the grid consists of a central ‘‘response area’’ whose neurons can be maximally stimulated, and [47] extend these attractor networks to include the learning of a ‘‘margin’’ surrounding it whose neurons cannot. The margin priors, but even these must be hand wired and so are practical only 1=2 for simple representations. Other models of learning [24–26,55] width is four tuning-curve standard deviations (4S ), making disregard variance information, so that what is learned is essen- spiking of putative neurons outside the grid extremely unlikely tially a mapping of means; nor, correspondingly, do they account even for stimuli at the edge of the response area. In accordance for the learning of priors. The probabilistic population coding with the broad tuning curves found in higher sensory areas and model [2] makes explicit the notion of encoding a posterior, but with previous models of population coding in multisensory areas includes no model of learning. [2,27], tuning-curve widths were themselves chosen so that their Finally, many authors have either anticipated [51,56,57] or full width at half maximum embraced one-sixth of the response explicitly proposed [58–60] that learning to process early sensory area. information might be viewed as forms of density estimation. Our The prior over the stimulus is either uniform or Gaussian in the work shows that the range of computations that can be assimilated space of joint angles. (Implementation of the Gaussian prior is to this statistical problem extends to the acquisition of two key detailed in Learning non-flat priors.) Since both dimensions of operations for motor planning and control: multisensory integra- prop space are allotted the same number of neurons (N) and the tion, even when the underlying stimulus is distributed non- tuning curves are isotropic and evenly spaced, but the physical uniformly, and coordinate transformations; and further that these ranges of these dimensions differ (3p=4 and p=2 for the shoulder computations can be combined hierarchically, as is observed in the and elbow, resp.), the induced covariance Cov½HDr in the the neural circuits underlying these operations. population code is anisotropic, being more precise in elbow than shoulder angle. The nonlinearity of the forward kinematics Methods likewise ensures anisotropy of Cov½HDr ; see Fig. 1A. This makes the problem more interesting, anisotropic covariances entailing, Notation is standard: capital letters for random variables, for example, optimal estimates that are not on the straight-line lowercase for their realizations; boldfaced font for vectors, italic for path between cue means (see e.g. Fig. 1 of [1]). scalars. h x The priors over the gains, G and G , which set the maximum mean spike counts, are independent and uniform between 12 and Input-data generation 18 spikes. Unless otherwise noted, gains in the testing data were Throughout, we work with the example case of integrating two- drawn from the same distribution as the training-data gains. dimensional (2D) proprioceptive and visual signals of hand location, but the model maps straightforwardly onto any pair of The optimal posterior distribution over the stimulus co-varying sensory signals. These two signals report elbow and To show that the model works, we must compare two posterior shoulder joint angles (PROP, H), and fingertip position in Cartesian distributions over the stimulus: the posterior conditioned on the space (VIS, X), respectively. Choosing the forward kinematics, h x input data, p(sDr ,r )—i.e. the ‘‘true’’ or ‘‘optimal’’ posterior—and X~F(H), to be invertible renders the variables isomorphic, so the posterior conditioned on the downstream/integrating units, that we can refer generically to them as a ‘‘stimulus’’ (S), inde- q(sDv) (see The RBM, below). That comparison is easiest to make, pendent of space. The kinematics model for most of the results has and to exhibit, when the optimal posterior is as simple as joint ranges of ½{p=2,p=4 (shoulder) and ½p=4,3p=4 (elbow) and possible—ideally, a Gaussian, which has only two nonzero cumu- limb lengths of 12 (upper arm) and 20 (forearm) cm; see inset of lants, mean and covariance. With a flat or Gaussian prior over the Fig. 1A. The exception is Fig. 7C,D, in which a one-degree-of- stimulus, the probabilistic population code that we are using does freedom (1D) arm was used for simplicity: X~L cos(H){E, with indeed have an approximately normal posterior for a unimodal link length L~12 cm and joint range ½p=6,5p=6, and E the population [2]; but to guarantee this for two populations that are position of the eye (EYE, gaze angle). Below, we describe data encoding the stimulus in different (i.e., nonlinearly related) spaces, generation from the 2D kinematics; the modifications for 1D are x h the unimodal posterior covariances (Cov½XDr and Cov½HDr ) straightforward. also must be small enough that typical errors lie within the linear h x Each training vector consists of a set spike counts, ½r ,r , regime of the arm kinematics (see Text S1). Given the gain (G) generated by choosing a random stimulus (s, i.e. h and x) and a regime and the tuning-curve widths (S ), choosing N~30 neurons h x random global gain for each modality (g ,g ), and encoding them in the N|N grid yields variances between 2 and 9 mm for the in a populations of neurons with Gaussian tuning curves (f ) and two populations, satisfying the requirement. These values are also independent Poisson spike counts—a ‘‘probabilistic population comparable to empirical values for visual and proprioceptive code’’ [2]: localization variances from human psychophysics, 5 mm and 50 mm , resp. [1]. These latter are in fact an upper bound, since h x h x h h x x they are with respect to behavior, the furthest downstream assay of p(r ,r js,g ,g )~p(r jh,g )p(r jx,g ) certainty. In any case, we stress that this and other compromises of ð2Þ h x ~P Pois½r jg f (h)P Pois½r jg f (x), i i i i the population code with biological realism (uniform tiling of the i i PLOS Computational Biology | www.ploscompbiol.org 14 April 2013 | Volume 9 | Issue 4 | e1003035 Multisensory Integration via Density Estimation stimulus space, identical tuning curves, etc.) serve to simplify the making them Bernoulli random variables. Unless otherwise noted analyses and interpretation rather than reflecting any limitation of in the results, the number of hidden units in the model is equal to the neural-network model. half the number of input units, i.e. the number of units in a single Now, whereas a Gaussian posterior requires a flat or Gaussian input population—thus forcing the model to represent the same prior, such a prior in prop space will induce an irregular prior in information in half the number of neurons. VIS space (and vice versa; see again Fig. 1A) —so there can be a During RBM training [17,62], input and hidden units recipro- Gaussian posterior only in one space. Results are therefore com- cally drive each other through the same weight matrix: puted in the space of the flat or Gaussian prior. Observing these constraints, the posterior cumulants can be written: V*q(vDr)~P Bern½v Ds(fWrzb g )ð4aÞ i v {1 {1 h {1 x T {1 Cov½HDr &S zg S zg (J S J) ð3aÞ 0 t t R*q(rDv)~P Pois½r Dexp(fW vzb g ), ð4bÞ j r {1 h {1 h E½Hjr&Cov½Hjr½S m zg S y(r ) 0 t which corresponds to Gibbs sampling from the joint distribution ð3bÞ x T {1 {1 x th represented by the machine. Here fzg is the i entry of the vector z; zg (J S J)F ½y(r ): b and b are, respectively, the vectors of biases for the hidden and v r observed units; W is the matrix of synaptic strengths; and (See Text S1 for a derivation.) Intuitively, the posterior precision {x s(x) :~1=(1ze ) is the logistic (sigmoid) function. (The lack of (inverse covariance, Eq. 3) is a sum of three precisions: the prior intralayer connections is what allows the entire joint to be sampled in {1 precision, S ; the weighted PROP (h) tuning-curve precision, just two steps.) As in a standard stochastic neural network, each unit’s {1 {1 S ; and the weighted VIS (x) tuning-curve precision, J S J. mean activity is a nonlinear transformation of a weighted sum of its (Since the posterior is expressed over H rather than X, the latter’s inputs. To ensure that this mean is in the support of its associated precision must be warped into h-space by the Jacobian, exponential-family distribution, the nonlinearities are chosen to be the J~LF =Lx, of the forward kinematics, which is evaluated at the inverse ‘‘canonical links’’ [63]: the logistic function for the Bernoulli center of mass of the proprioceptive population.) The weights are P hidden units, and the exponential function for the Poisson input units. s s the total spike counts for each population, g :~ r , s~h,x. j j (Technically, the use of Poisson input units makes the model an The posterior mean (Eq. 3b) is a normalized, weighted sum of ‘‘exponential family harmonium’’ [62] rather than a restricted three estimates: the prior mean, m ; the center of mass of the h 0 Boltzmann machine, which would have all Bernoulli units.) The unit’s population, y(r ); and the (transformed) center of mass of the x activity (presence of a spike, or spike count) is sampled from this mean. {1 x population, F ½y(r ). The weights are the three precisions. The P P s s s th center of mass y(r ) :~ s r = r , with s the j preferred Training j j j j j j stimulus, is likewise intuitive, being the maximum-likelihood Weights and biases were initialized randomly, after which the estimate of the stimulus for a single population [61]. networks were trained on batches of 40,000 vectors, with weight The nonlinearity (cosine) in the 1D ‘‘coordinate-transformation changes made after computing statistics on mini-batches of 40 model’’ (Fig. 7C,D), X~L cos(H){E, likewise allows the posterior vectors apiece. One cycle through all 1000 mini-batches consti- to be normal in only one space. Since two of the variables live in tutes an ‘‘epoch,’’ and learning was repeated on a batch for 15 Cartesian space—X (VIS) and E (EYE) —and only H (PROP) lives in epochs, after which the learning rates were lowered by a factor of pffiffiffiffiffi joint-angle coordinates, we chose uniform priors over the former, 10. This process was repeated a total of seven times, i.e. 90 sampling them between L cos(5p=6)=2 and L cos(p=6)=2,so that epochs, after which learning was terminated. (The number of their sum never exceeded the bounds of the joint range (see above, epochs and the learning-rate annealing schedule were determined Input-data generation). Zero in this space corresponds to hand empirically.) Weight and bias changes were made according to position at the center of fixation for x, and to central fixation for E. one-step contrastive divergence [16,17]: The addition of a non-flat prior (Fig. 6) will only have an appre- ciable effect on the posterior if the width of the prior distribution is T T DW !Srv {S^rr^vv T T q(^rrDv)q(^vvD^rr) comparable to that of the likelihoods, i.e. the single-modality p(r)q(vDr) localization covariances. The covariance of the prior was therefore Db !Sr{SrrT T ð5Þ r ^ q(rrDv) p(r)q(vDr) constructed so that, along both dimensions, the extreme angles were 150 standard deviations apart—a reasonable prior distribu- Db !Sv{S^vvT T , q(^rrDv)q(^vvD^rr) p(r)q(vDr) tion, perhaps, after extensive training on a reaching task to a single target location [47]. Using more realistic, broader priors would where the circumflexes differentiate the zeroth (no hat) and first require relaxing the constraint that the optimal posterior (hat) steps of Gibbs sampling. That is, the input data (r) are distribution over the stimulus be Gaussian—which again we insist propagated up into the multisensory (hidden) layer (v), back down upon only for ease of analysis. into the input units (^rr), then back up into the multisensory neurons (vv); see Fig. 1B. This is repeated for all the data (that is, for each h x The RBM r ,r drawn from Eq. 2, for each stimulus and set of gains drawn The neural circuit for sensory integration was modeled as a from p(s) and p(g)). The change in the weight connecting neuron i restricted Boltzmann machine, a two-layer, undirected, generative to neuron j is thus proportional to the difference between the first model with no intralayer connections and full interlayer connec- and second pair of correlations between them—a Hebbian and an tions (Fig. 1A, bottom right) [17,62]. The input layer (R) consists of anti-Hebbian term. This rule approximates gradient descent on an Poisson random variables, whose observed values are the objective function for density estimation (Hinton’s ‘‘contrastive population codes just described. The hidden-layer units (V) are divergence’’ [17], or alternatively ‘‘probability flow’’ [64]). binary, indicating whether or not a unit spiked on a given trial, Although this specific learning rule has not been documented in PLOS Computational Biology | www.ploscompbiol.org 15 April 2013 | Volume 9 | Issue 4 | e1003035 Multisensory Integration via Density Estimation vivo, it is constructed entirely of components that have been: Supporting Information change in firing rate based on (local) correlations between pre- and Figure S1 Probabilistic graphical models. The neural postsynaptic spike counts. Anti-Hebbian learning has been populations have been collapsed to single nodes. (A) A directed observed in a neural circuit [65], albeit not in mammalian cortex, model for the data for multisensory integration. (B) A model that and plausible cellular mechanisms for it have been described [66]. captures the independence statements characterizing coordinate transformations. (C) A model that captures the case where one Testing population (R ) sometimes reports one stimulus, sometimes the After training, learning was turned off, and the network was other, as determined by T. tested on a fresh batch of 40,000 data vectors (Fig. 1B): stimuli (EPS) were again drawn uniformly from the grid of joint angles, and the corresponding spike counts simulated by drawing from the two Figure S2 Coordinate transformation tuning curves populations of Gaussian-tuned, Poisson neurons. For each input under different sampling schemes. (A) The scattered black vector, hidden-layer activities were computed by drawing 15 dots are sample pairs of body-centered hand position sample vectors (from p(vDr)) and averaging them. Since the input (T :~L cos(h)) and gaze angle (E) that were generated from body gains are between 12 and 18, and assuming that hidden and input the graphical model at the bottom of Fig. 7C. Since E and units integrate information over the same-sized time window from X~T {E were sampled from uniform distributions on lines, body the past, this implies that hidden neurons fire no faster than input the resulting space is a parallelogram. Depending on which neurons—which would otherwise constitute a violation of the rectangular subregion is selected (red, green, blue), different information bottleneck. This is essential for our task, since we histograms of tunings result— (B), (C), and (D), respectively. See require an efficient coding, not merely a different one. text for details of the analysis. For each trial, decoding the hidden vector consists of estimating (EPS) from it the mean and covariance of the optimal posterior p(sDr)— Text S1 Derivation of the optimal posterior for multi- that is, all the information in the network about the stimulus. sensory integration, coordinate transformation, and Generally, finding a good decoder can be hard; but because the sometimes-decoupled inputs; notes on the fractional network is a generative model, we can use its generative (hidden- information loss; a rationale for the number of hidden to-input) weights to turn the hidden vector back into expected units; and a note on the tuning of coordinate-transform- h x input spike counts (E½R ,R Dv)—which we know how to decode: ing neurons. Eq. 3. In practice, it often turns out that the weighted sum in Eq. (PDF) 3b is unnecessary: the center of mass from a single (updated) population suffices. When showing results in joint angles, we take Acknowledgments the center of mass of the prop population; likewise for Cartesian space and vis. Also, reconstruction of the total spike counts was Base code for training a deep belief network with contrastive divergence was taken from Salukhutdinov and Hinton [67]. Jeff Beck helpfully mildly improved by first mapping them to the true (input) total suggested the fractional information loss measure. spike counts via a standard neural network; in cases where this final step was applied (Fig. 3A), training and testing used different Author Contributions data. The posterior covariances used in Fig. 3B–D, however, did not use any such trained decoder; they were reconstructed just as Conceived and designed the experiments: JGM MRF. Performed the the posterior means were, i.e. by using the generative weights and experiments: JGM. Analyzed the data: JGM. Wrote the paper: JGM PNS. then applying equation Eq. 3a. Supplied the intuitions: MRF. Supplied the concepts: JGM. References 1. van Beers RJ, Sittig A, van Der Gon JJD (1999) Integration of proprioceptive 14. Sur M, Pallas SL, Roe AW (1990) Cross-modal plasticity in cortical and visual positioninformation: An experimentally supported model. Journal of development: differentiation and specification of sensory neocortex. TINS 13: Neurophysiology 81: 1355–1364. 341–345. 2. Ma WJ, Beck JM, Latham PE, Pouget A (2006) Bayesian inference with 15. Lyckman AW, Sur M (2002) Role of Afferent Activity in the Development of probabilistic population codes. Nature Neuroscience 9: 1423–1438. Cortical Specification. Results and Problems in Cell Differentiation 39: 139–156. 3. Ernst MO, Banks MS (2002) Humans integrate visual and haptic information in 16. Hinton GE, Osindero S, Teh Y (2006) A fast learning algorithm for deep belief a statistically optimal fashion. Nature 415: 429–433. nets. Neural Computation 18: 1527–1554. 4. Alais D, Burr D (2004) The ventriloquist effect results from near-optimal 17. Hinton GE (2002) Training Products of Experts by Minimizing Contrastive bimodal integration. Current biology : CB 14: 257–62. Divergence. Neural Computation 14: 1771–1800. 5. Kording KP, Wolpert DM (2004) Bayesian integration in sensorimotor learning. 18. Sober S, Sabes PN (2003)Multisensory integration during motor planning. Nature 427: 244–7. Journal of Neuroscience 23: 6982–6992. 6. Stocker AA, Simoncelli EP (2006) Noise characteristics and prior expectations in 19. Fetsch CR, Pouget A, DeAngelis GC, Angelaki DE (2012) Neural correlates of human visual speed perception. Nature neuroscience 9: 578–85. reliability-based cue weighting during multisensory integration. Nature neuro- 7. Knill DC, Pouget A (2004) The Bayesian brain: the role of uncertainty in neural science 15: 146–54. coding and computation. Trends in Neurosciences 27: 712–719. 20. Sober S, Sabes PN (2005) Flexible strategies for sensory integration during motor 8. Held R, Freedman SJ (1963) Plasticity in Human Sensorimotor Control. Science planning. Nature Neuroscience 8: 490–497. 142: 455–462. 21. Ko¨rding KP, Beierholm U, Ma WJ, Quartz S (2007) Causal inference in 9. Knudsen EI, Knudsen PF (1989) Vision calibrates sound localization in multisensory perception. PloS One 2: e943. developing barn owls. Journal of Neuroscience 9: 3306–13. 22. Ernst MO, Bu¨lthoff HH (2004) Merging the senses into a robust percept. Trends 10. Ghahramani Z, Wolpert DM, Jordan MI (1996) Generalization to local in cognitive sciences 8: 162–9. remappings of the visuomotor coordinate transformation. Journal of Neurosci- 23. McGuire LMM, Sabes PN (2009) Sensory transformations and the use of ence 16: 7085–7096. multiple reference frames for reach planning. Nature neuroscience 12: 1056– 11. Simani M, McGuire LMM, Sabes PN (2007) Visual-shift adaptation is 61. composed of separable sensory and task-dependent effects. Journal of 24. Davison AP, Fregnac Y (2006) Learning Cross-Modal Spatial Transformations neurophysiology 98: 2827. through Spike Timing-Dependent Plasticity. Journal of Neuroscience 26: 5604– 12. Redding GM, Rossetti Y, Wallace B (2005) Applications of prism adaptation: a 5615. tutorial in theory and method. Neuroscience and biobehavioral reviews 29: 431–44. 25. Xing J, Andersen RA (2000) Models of the posterior parietal cortex which 13. Held R, Hein A (1963) Movement-produced stimulation in the development of perform multimodal integration and represent space in several coordinate visually guided behavior. Physiological Psychology 56: 872–876. frames. Journal of cognitive neuroscience 12: 601–14. PLOS Computational Biology | www.ploscompbiol.org 16 April 2013 | Volume 9 | Issue 4 | e1003035 Multisensory Integration via Density Estimation 26. Salinas E, Abbott LF (1995) Transfer of Coded Information from Sensory to 45. Sabes PN (2011) Sensory integration for reaching: Models of optimality in the context of behavior and the underlying neural circuits. Progress in brain research Motor Networks. Journal of Neuroscience 15: 6461–6474. 191: 195–209. 27. Dene`ve S, Latham PE, Pouget A (2001) Efficient computation and cue 46. Wu S, Amari Si (2005) Computing with continuous attractors: stability and integration with noisy population codes. Nature Neuroscience 4: 826–831. online aspects. Neural computation 17: 2215–39. 28. Duhamel JR, Bremmer F, Ben Hamed S, GrafW (1997) Spatial invariance of 47. Verstynen T, Sabes PN (2011) How each movement changes the next: an visual receptive fields in parietal cortex neurons. Nature 389: 845–8. experimental and theoretical study of fast adaptive priors in reaching. The 29. McGuire LMM, Sabes PN (2011) Heterogeneous representations in the superior Journal of neuroscience : the official journal of the Society for Neuroscience 31: parietal lobule are common across reaches to visual and proprioceptive targets. 10050–9. Journal of Neuroscience 31: 6661–73. 48. Fo¨ldia´k P (1993) The ‘ideal homunculus’: Statistical inference from neural 30. Avillac M, Dene`ve S, Olivier E, Pouget A, Duhamel JR (2005) Reference frames population responses. In: Eeckman FH, Bower JM, editors, Computation and for representing visual and tactile locations in parietal cortex. Nature Neuro- neural systems, Norwell, MA: Norwell, MA: Kluwer Academic Publishers, science 8: 941–949. chapter 9. pp. 55–60. 31. Bremner LR, Andersen RA (2012) Coding of the Reach Vector in Parietal Area 49. Olshausen BA, Field D (1997) Sparse coding with an overcomplete basis set: A 5d. Neuron 75: 342–351. strategy employed by V1? Vision Research 37: 3311–3325. 32. Pesaran B, Nelson MJ, Andersen RA (2006) Dorsal premotor neurons encode 50. Ghahramani Z (1995) Factorial learning and the EM algorithm. In: Tesauro G, the relative position of the hand, eye, and goal during reach planning. Neuron Touretzky DS, Leen TK, editors. Advances in neural information processing 51: 125–34. systems. Cambridge (MA): MIT Press. 33. Buneo CA, Andersen RA (2006) The posterior parietal cortex: sensorimotor 51. Barlow HB (1961) Possible principles underlying the transformation of sensory interface for the planning and online control of visually guided movements. messages. Sensory communication 1: 217–234. Neuropsychologia 44: 2594–606. 52. Becker S, Hinton GE (1992) Self-organizing neural network that discovers 34. Duhamel Jr, Colby CL, Goldberg ME, Colby CL, Goldberg ME (1998) Ventral surfaces in random-dot stereograms. Nature 355: 161–163. Intraparietal Area of the Macaque : Congruent Visual and Somatic Response 53. Yildirim I, Jacobs Ra (2012) A rational analysis of the acquisition of multisensory Properties Ventral Intraparietal Area of the Macaque : Congruent Visual and representations. Cognitive science 36: 305–32. Somatic Response Properties. Journal of Neurophysiology 79: 126–136. 54. Pouget A, Dene`ve S, Duhamel JR (2002) A computational perspective on the 35. Ferraina S, Johnson PB, Garasto MR, Ercolani L, Bianchi L, et al. (1997) neural basis of multisensory spatial representations. Nature Reviews Neurosci- Combination of Hand and Gaze Signals During Reaching: Activity in Parietal ence 3: 1–7. Area 7m of the Monkey. Journal of Neurophysiology 77: 1034–1038. 55. Burnod Y, Grandguillaume P, Otto I, Ferraina S, Johnson PB, et al. (1992) 36. Galletti C, Gamberini M, Kutz DF, Fattori P, Luppino G, et al. (2001) The Visuomotor transformations underlying arm movements toward visual targets: a cortical connections of area V6: an occipito-parietal network processing visual neural network model of cerebral cortical operations. Journal of Neuroscience information. The European journal of neuroscience 13: 1572–88. 12: 1435–53. 56. Attneave F (1954) Some informational aspects of visual perception. Psychological 37. Graziano MS (1999) Where is my arm? The relative role of vision and review 61: 183–93. proprioception in the neuronal representation of limb position. Proceedings of 57. Barlow HB (2001) Redundancy reduction revisited. Network (Bristol, England) the National Academy of Sciences of the United States of America 96: 10418– 12: 241–53. 58. Lewicki MS, Olshausen BA (1999) Probabilistic framework for the adaptation 38. Shipp S, Blanton M, Zeki S (1998) A visuo-somatomotor pathway through and comparison of image codes. J Opt Soc Am 16: 1587–1601. superior parietal cortex in the macaque monkey: cortical connections of areas 59. Lewicki MS (2002) Efficient coding of natural sounds. Nature Neuroscience 5: V6 and V6A. The European journal of neuroscience 10: 3171–93. 356–363. 39. Battaglia-Mayer A, Caminiti R, Lacquaniti F, Zago M, Sapienza RL, et al. 60. Eichhorn J, Sinz F, Bethge M (2009) Natural image coding in V1: how much use (2003) Multiple Levels of Representation of Reaching in the Parieto-frontal is orientation selectivity? PLoS computational biology 5: 1–16. Network. Cerebral cortexCerebral cortex 13: 1009–1022. 61. Dayan P, Abbott L (2001) Theoretical Neuroscience. MIT Press, 101–106 pp. 40. Graziano MSA (1998) Spatial maps for the control of movement and Charles G 62. Welling M, Rosen-Zvi M, Hinton GE (2004) Exponential Family Harmoniums Gross. Current Opinion in Neurobiology 8: 195–201. with an Application to Information Retrieval. In: Neural Information Processing 41. Johnson PB, Ferraina S, Bianchi L, Caminiti R (1996) Cortical networks for Systems 17. pp. 1481–1488. visual reaching: physiological and anatomical organization of frontal and nd 63. McCullagh P, Nelder JA (1989) Generalized Linear Models. 2 edition. parietal lobe arm regions. Cerebral cortex (New York, NY : 1991) 6: 102–19. London: Chapman and Hall/CRC. pp. 26–32. 42. Lewis JW, Van Essen DC (2000) Corticocortical connections of visual, 64. Sohl-Dickstein J, Battaglino P, DeWeese MR (2011) Minimum Probability Flow sensorimotor, and multimodal processing areas in the parietal lobe of the Learning. Proc. ICML 2011: 905–912. macaque monkey. The Journal of comparative neurology 428: 112–37. 65. Bell CC, Caputit A, Grant K, Serrier J (1993) Storage of a sensory pattern by 43. Wise SP, Boussaoud D, Johnson PB, Caminiti R (1997) Premotor and parietal anti-Hebbian synaptic plasticity in an electric fish. PNAS 90: 4650–4654. cortex: corticocortical connectivity and combinatorial computations. Annual 66. Lisman J (1989) A mechanism for the Hebb and the anti-Hebb processes review of neuroscience 20: 25–42. underlying learning and memory. Proceedings of the National Academy of 44. Chang SWC, Snyder LH (2010) Idiosyncratic and systematic aspects of spatial Sciences of the United States of America 86: 9574–8. representations in the macaque parietal cortex. Proceedings of the National 67. Hinton GE, Salakhutdinov RR (2006) Reducing the Dimensionality of Data Academy of Sciences of the United States of America 107: 7951–6. with Neural Networks. Science 313: 504–507. PLOS Computational Biology | www.ploscompbiol.org 17 April 2013 | Volume 9 | Issue 4 | e1003035
PLoS Computational Biology – Public Library of Science (PLoS) Journal
Published: Apr 18, 2013
You can share this free article with as many people as you like with the url below! We hope you enjoy this feature!
Read and print from thousands of top scholarly journals.
Already have an account? Log in
Bookmark this article. You can see your Bookmarks on your DeepDyve Library.
To save an article, log in first, or sign up for a DeepDyve account if you don’t already have one.
Copy and paste the desired citation format or use the link below to download a file formatted for EndNote
Access the full text.
Sign up today, get DeepDyve free for 14 days.
All DeepDyve websites use cookies to improve your online experience. They were placed on your computer when you launched this website. You can change your cookie settings through your browser.