Imagine a game of dodgeball, in which your attention is consumed by whizzing balls and the warlike cries of aggressive athletes, all reverberating off the gymnasium walls. Suddenly, you hear someone on the opposing side yell to his teammates that his next shot will be aimed at you. At that instant, you look over at his side of the court and see three hostile-looking boys holding dodgeballs, approaching the halfline, mouths moving as they boldly communicate strategy and support. Which one identified you as his target? Luckily, the same voice continues its warmongering, designating targets for his other teammates, and you are now able to match the movement of the mouth of one of the boys with the relevant sounds. Sound and sight have come together, and you are prepared for the appropriate ball.
Amidst that bombardment of sensory information, how did your brain find the appropriate auditory-visual correspondence to determine the origin of the battle cry? At every moment, in dodgeball and in life, our appreciation of the external world is due to a combination of sights, smells, sounds, touches, and tastes. Our brains must integrate this deluge of information to generate a coherent, seamless picture of the environment; this process is called multisensory integration. When integrated properly, the simultaneous acquisition of information from different sources helps us refine our percept of the world. However, our incoming sensory information is often fraught with uncertainty.
To explore this concept on sensory uncertainty, it is useful to focus on one sensory modality, such as vision. Getting back to the game: we know where the ball is coming from, but we still need to dodge (or catch!) it. Your assailant winds his bulging arm back like a catapult and mechanically releases the projectile; as it careens towards you, you attempt to calculate its speed and trajectory. Your eyes convey imperfect information about the ball's velocity, so your brain can only estimate it. Combining this information with your memory of his previous throws reduces the error in this estimate, but not all velocities are equally probable in theory; over the course of the game, there will be a probability distribution of velocities. Your best estimate, and your ability to dodge the ball, results from combining information about the distribution of prior velocities with evidence from sensory (visual) feedback. This interpretation of probabilities is called "Bayesian inference," and various studies have shown that the human brain performs Bayesian inference at a nearly optimal level.
Multisensory integration becomes far more complex when we consider this uncertainty inherent to our sensory information. Nevertheless, our brains are intriguingly capable of weighing different sensory signals according to their corresponding reliabilities; that is, our brains pay more attention to "reliable" sensory information, while disregarding "unreliable" information. This ability results in an "optimal" approximation of reality: we are (nearly) perfect maximum-likelihood integrators.
The complexity of this integration process is exposed when our perceptual world does not correspond to reality. One striking example of this vulnerability is ventriloquism. A good ventriloquist will thwart our multisensory integration process by synchronizing the movements of a puppet's mouth with his or her voice, while the movements of his or her mouth are imperceptible. Thus, we perceive the voice as originating from the puppet, as opposed to the ventriloquist.
This deception is a consequence of our brain's propensity to give more weight to visual information than auditory information during the integration process; the neural circuits have adapted to the fact that the visual system is far more reliable at determining location than the auditory system. The direction of a light source is directly determined by the position stimulated on the retina, whereas the direction of a sound is calculated by the differences in timing and intensity of stimulation in one ear relative to the other. Thus, our brain "trusts" the visual system more than the auditory system, and rightly so. If there is a discrepancy between the two, the visual information is favored in the generation of a unified percept of reality, and the puppet "speaks."
So, the big question is, as always: what are the neural mechanisms underlying this process? How does the brain weigh different signals according to their corresponding reliabilities when generating the most realistic percept? How is uncertainty represented at the neural level?
A group in Alex Pouget's laboratory recently published theoretical answers to these questions in Nature Neuroscience. The premise for their exploration was the fact that neurons in the cortex respond to identical stimuli with high variability. For example, think of a neuron in the visual cortex that responds to an object moving from left to right. When exposed to such a stimulus, it will not respond exactly the same way each time: the same neuron may respond by firing 9 times, or 14 times, or not at all. Although this particular cell is, on average, activated by left-to-right motion, its response to this stimulus may change dramatically from one presentation to the next.
Pouget's group hypothesized that this variability may represent sensory uncertainty. Let’s return our focus to the visual system and dodgeball. Your uncertainty of the speed at which the dodgeball is moving is related to the fact that neurons in your visual cortex do not fire in exactly the same way every time you see a ball moving towards you. Balls flying at your head can look different depending on your vantage point (and other factors, such as the physical properties of the ball itself), and thus give rise to different responses in your visual cortex every time. If an approaching dodgeball always elicited the same neural responses, you would be able to determine its speed with certainty, by evoking your memory of the ball's speed in past occasions.
The researchers showed mathematically that this variability could represent probability distributions for an object's location. Greater uncertainty (i.e. wider probability distributions) would thus be represented by higher variability in responses of neurons in the auditory cortex relative to those in the visual cortex. This internal representation of sensory uncertainty allows the brain a relatively straightforward (linear) way to combine neural activities: the Bayesian “decoder” of the brain can simply pool the probability functions of multiple neurons (which can represent multiple sensory channels) together to generate an optimal inference of an object's location.
So when watching a ventriloquist act, our visual system detects the movement of a puppet’s mouth, which neurons in our visual cortex register with low variability (high certainty and a narrower, more mathematically dominant probability function), while the neurons in our auditory cortex represent sound originating from the mouth of the ventriloquist with high variability (low certainty and a wider, less influential probability function). When the brain combines these functions together with its Bayesian decoder, the visual system "wins" and we think the sound came from the puppet.