The primary aim of this chapter will be to review the physiological mechanisms that are
involved in two basic and extraordinarily important functions of the auditory system: (1)
conversion of the vibratory energy that reaches the ear drum into a series of neural impulses on
the auditory nerve (this is called transduction), and (2) the spectrum analysis function of the
auditory system; that is, the ability of the auditory system to break a complex sound wave into its
individual frequency components.
Before getting into the details, it might be useful to consider some of the fundamental
capabilities of the auditory system which, from any point of view, are nothing short of awe
inspiring. A brief and by no means exhaustive list appears below.
The faintest sound that can be detected by the human ear is so weak that it moves the ear
drum a distance that is equivalent to one-tenth the diameter of a hydrogen molecule. If the
ear were slightly more sensitive we would hear the random particle oscillations known as
The most intense sound that can be heard without causing pain is approximately 140 dB
more intense than a barely detectable sound. This means that that the dynamic range of the
ear – the ratio of the most intense sound that can be heard without pain to the intensity of a
barely audible sound – is an astounding 100 trillion to 1.
The frequency range of human hearing runs from approximately 20 Hz to 20,000 Hz, a
range of about 10 octaves.
For signal levels approximating conversational speech, the ear can detect frequency
differences that are on the order of 0.1%, or approximately 1 Hz for a 1,000 Hz test signal
(Wier, Jestaedt, & Green, 1977).
Later in the chapter we will see that the auditory system utilizes an elegant mechanism that
delivers sounds of different frequencies to different physical locations along the cochlea;
i.e., a sound of one frequency will produce the greatest neural activity at one physical
location while a sound of a slightly different frequency will activate a different location.
The difference in frequency that a listener can barely detect corresponds to a difference in
physical location along the cochlea of about 10 microns (1 micron = one millionth of a
meter, or one thousandth of a millimeter). This distance, in turn, is approximately the width
of a single auditory receptor cell (Davis and Silverman, 1970).
Under ideal conditions listeners can detect intensity differences as small as 0.6 dB (Gulick,
Gescheider, and Frisina, 1989).
Listeners can locate the source of a sound based on differences in the time of arrival
between the two ears that are as small as 10 s (i.e., 10 millionths of a second).
Further, the anatomy that supports this processing is a miracle of miniaturization. For example,
the middle ear cavity is approximately 3 mm in width and approximately 15 mm in the vertical
dimension (Zemlin, 1968), with roughly the volume of a sugar cube. The cochlea, which
contains the auditory receptors, is even smaller, at approximately 5 mm in height and
approximately 9 mm in diameter at its widest point (Gelfand, 1990).
Figure 4-1. The three functional subdivisions of the auditory system. Reprinted from
Deutsch and Richards (1979).
Overview of the Auditory System
The auditory system can be divided into three major functional subsystems: the conductive
mechanism, the sensorineural mechanism, and the central auditory system (see Figure 4-1).
In terms of anatomical structures, the conductive mechanism consists of the pinna, the ear canal
(also known as the external auditory meatus), the ear drum (also known as the tympanic
membrane), and the middle ear, which contains three very small bones called the auditory
ossicles. The primary function of the conductive mechanism is to transmit the vibrations that are
picked up at the tympanic membrane to the structures of the inner ear, a fluid-filled structure
which contains the auditory receptors. However, as we shall see, the middle ear also
accomplishes a pressure amplification trick which significantly enhances the sensitivity of the
The sensorineural mechanism consists of the structures of the cochlea and the auditory
nerve, also known as the 8th cranial nerve. The auditory nerve conveys neural impulses
between the cochlea and the brain stem, which is part of the central auditory system. The inner
ear contains specialized sensory receptor cells called hair cells. These cells are responsible for
converting the vibratory energy that enters the auditory system into nerve impulses that are
transmitted to the central nervous system via the auditory nerve. In addition to the conversion of
vibratory energy into neural impulses, the cochlea also carries out a spectrum analysis in which
the low frequency components of the signal are directed to one end of the cochlea and the highfrequency components are directed to the other end. As will be seen later in this chapter, the
precise role that is played by this frequency analysis is only partially understood.
The electrical signals that are generated by the hair cells in the inner ear are carried by the
auditory nerve to central auditory system, which consists of structures in the brain stem and
auditory cortex. It is often said that the central auditory system is responsible for higher level
functions of auditory analysis, such as the "... recognition, interpretation, and integration of
auditory information ..." (Deutsch & Richards, 1979). There is little question that the central
auditory system is, in fact, heavily involved in higher level functions such as speech recognition
and the ability to recognize familiar voices and familiar melodies. However, the central auditory
system also plays a very important role in relatively low-level aspects of auditory analysis, such
as sound localization, pitch perception and, quite possibly, spectrum analysis.
The Conductive Mechanism
The Outer Ear
The outermost portion of the conductive mechanism is a cartilaginous structure called the
pinna, also known as the auricle (see Figure 4-2). While the approximately funnel shape of the
auricle might lead one to believe that the structure may play some role in sound gathering, this
appears not to be the case (von Bekesy & Rosenblith, 1958). A prominent visual characteristic of
the auricle is the rather convoluted shape consisting of a number of ridges, grooves, and
depressions. It appears that this complex topography, along with other factors, plays some role in
sound localization (von Bekesy & Rosenblith, 1958; Batteau, 1967; Freedman & Fisher, 1968).
Sound is conducted to the tympanic membrane through the external auditory meatus, also
known as the ear canal. The lateral two-thirds of the ear canal is cartilaginous and the medial
third is bone. The general shape of the ear canal
approximates that of a uniform tube, open at the
lateral end and closed medially by the tympanic
membrane. The tube averages approximately 2.3 cm
in length (Wiener & Ross, 1946). Recall that the
resonant frequency pattern of a uniform tube which
is closed at one end (by the ear drum in this case)
can be determined if its length is known. Using the
formula from Chapter 3, the lowest resonant
frequency of the ear canal should be approximately
3800 Hz (F1 = 35,000/(4 . 2.3) = 35,000/9.2 = 3804
Hz). This figure agrees well with experimental data
Figure 4-2. The pinna or auricle. (Reprinted
(Wiener & Ross, 1946; Fleming, 1939), although
from Zemlin, 1968)
estimates vary. This resonance is partially
responsible for the heightened sensitivity of the auditory system to frequencies in the middle
portion of the spectrum (see Chapter 3, Figure 3-24).
The sound wave that enters the ear canal sets the tympanic membrane into vibration. When
instantaneous air pressure is relatively high (compression), the membrane will be forced inward,
and when instantaneous air pressure is relatively low (rarefaction), the membrane will be forced
outward. Consequently, the inward and outward movements of the tympanic membrane mirror
Figure 4-3. The ear canal and middle ear cavity. Reprinted from Denes and Pinson, The
Speech Chain, 1993, W.H. Freeman & Co.
those of the sound wave that is driving it; for example, if the tympanic membrane is excited by a
500 Hz sinusoid, the tympanic membrane will move inward and outward sinusoidally at 500 Hz.
In general, the instant-to-instant displacements of the tympanic membrane will mirror the
instantaneous air pressure waveform that is driving the membrane.
The Middle Ear
The middle ear or tympanic cavity is an air-filled chamber whose volume approximates
that of a sugar cube (see Figure 4-3). The middle ear communicates with the nasopharynx via the
Eustachian tube. This tube is approximately 35 mm in length in adults and angles downward
and forward to connect the anterior wall of the tympanic cavity with the nasopharynx. The tube
is normally closed, but opens during yawning and swallowing. When the tube opens, air can
travel either into or out of the middle ear to create an equilibrium between the air pressure inside
the tympanic cavity and that of the outside air. The Eustachian tube also plays an important role
in allowing fluids to drain from the middle ear into the nasopharynx.
In terms of the broad overview presented here, the most important structures in the
tympanic cavity are the three ossicles, a series of very small bones referred to collectively as the
ossicular chain (see Figure 4-4). The largest of the ossicles is the malleus, which attaches
directly to the tympanic membrane. The head of the malleus articulates with the incus, which in
turn connects to a very small stirrup-shaped bone called the stapes. The stapes ends in an oval
plate called the footplate. The stapes footplate attaches to an opening into the labyrinth called
the oval window. The labyrinth is a fluid-filled structure that contains the cochlea and the
vestibular system, which is responsible for our sense of balance. The stapes footplate is attached
to the oval window via a circular ligament called the annular ligament. Directly below the oval
window is a second opening into the labyrinth
called the round window. The round window is
covered by a very small membrane called the
internal tympanic membrane.
A reasonable question to ask about the
auditory system is why we have a tympanic
membrane and ossicular chain at all. Since a
primary effect of these structures is to transmit
vibrations to the fluid-filled structures of the inner
ear, then why isn't the oval window simply
covered with a flexible membrane that is driven
directly by the sound wave? Aquatic animals, in
fact, make use of a "direct-drive" system with no
middle ear. A system of this kind would work in
land animals as well, but for reasons that are
Figure 4-4. The auditory ossicles. (From Yost and
explained below, a substantial loss of energy
would result. The key to understanding the role
that is played by the tympanic membrane and ossicular chain is to appreciate the energy loss that
occurs when a sound wave is transmitted from the air medium in which the sound is initially
generated to the fluid medium that exists inside the inner ear.
We know from everyday experience that we do not hear airborne sound very well when we
are underwater. The primary reason for this is that there exists an impedance mismatch between
the air medium in which the airborne sound is initially generated and the fluid medium into
which the vibratory distrubance must be transmitted in order for our underwater listener to hear
the sound. Impedance is the total opposition to the flow of energy,1 and the mismatch results
from the fact that air is a low-impedance medium while water (and other similar fluids) is a highimpedance medium. These differences in impedance can be demonstrated simply by running a
cupped hand through air and water. There is a general rule that states that energy is reflected
back toward the source when a signal reaches the boundary between two media whose
impedances do not match. In the case of air and fluid, the impedance mismatch is quite large, and
when the signal reaches the air-fluid boundary, only 1/1,000th of the energy is absorbed into the
fluid medium, with the remainder being reflected back toward the source. Represented on a
decibel scale, the loss of signal intensity is 30 dB. In the formula below, the signal intensity on
the airborne side of the air-fluid boundary serves as the reference intensity, and the signal
intensity on the fluid side of the boundary serves as the measured intensity.
Impedance consists of three distinct components: resistance, capacitive reactance (also known as compliant reactance), and mass
reactance (also known as inductive reactance or inertive reactance). Resistance is simply the dissipation of energy due to friction. When the
head of a thumb tack is rubbed back and forth on the surface of a table, the tack heats up because of the friction of the two surfaces. Capacitive
reactance is opposition that is offered due to the elastic properties of an object. For example, when you push against a spring, compressing it
beyond its resting state, the spring generates a force that opposes the applied force. The same kind of opposition to an applied force occurs when
a spring is stretched beyond its resting state. Mass reactance is opposition due to the inertial properties of objects; that is, the tendency of a resting
object to remain at rest, and the tendency of a moving object to remain in motion. Impedance is the vector sum of resistance, capacitive reactance,
and mass reactance, with vector sum simply indicating that these three quantities need to be added using the Pythagorean theorem.
dB = 10 log10 Im/Ir
= 10 log10 1/1,000
= 10 (-3)
= -30 dB
The negative sign here simply means that the signal will be 30 dB weaker on the fluid side of the
boundary. Consequently, if the airborne sound wave were to directly drive a simple membrane
covering the oval window, a 30 dB loss in signal intensity would occur at the air-fluid boundary.
This is not a minor loss of energy. As we will see in the chapter on auditory perception, a 10 dB
decrease in intensity corresponds to a decrease of approximately one-half in our subjective
impression of loudness. This means that a 50 dB signal, for example, sounds only one-eighth as
loud as an 80 dB signal.
One of the primary functions of the middle ear is to amplify pressure so as to overcome a
large portion of this energy loss. This is accomplished in two ways: (1) an increase in pressure
that occurs when the vibrations that are picked up on the relatively large surface area of the
tympanic membrane are focused on the very small surface area of the stapes footplate, and (2) an
increase in force (and therefore pressure as well) that occurs as a result of the mechanical lever
action of the ossicular chain. The "area trick," known as the condensation effect, is by far the
more important of the two effects. Recall from Chapter 2 that there is an important distinction
between force and pressure: force is the amount of push or pull on an object, and is the product
of mass and acceleration; pressure, on the other hand, is force per unit area. A major implication
of this relationship is that pressure can be amplified without a change in force simply by
decreasing the area over which the force is delivered. This is the design principle underlying
thumb tacks and knives with sharp cutting edges, and exactly this principle is at work in the
middle ear as the energy that is delivered to the relatively large area of the tympanic membrane
is focused on the very small area at the stapes footplate. The amount of pressure amplification
that results from this concentration of force is proportional to the ratio of the two areas that are
involved. The effective area of the tympanic membrane is approximately 0.594 cm2, while the
area of the stapes footplate is approximately 0.032 cm (Durrant & Lovrinic, 1984).
Consequently, pressure at the stapes footplate will be approximately 18.6 times greater than
pressure at the tympanic membrane (0.594/0.032 = 18.6). This pressure amplification can be
represented on a decibel scale. Since we are talking about an increase in pressure, the pressure
version of the decibel formula is needed:
dB = 20 log10 (0.594/0.032)
= 20 log10 (18.6)
= 20 (1.27)
= 25.4 dB
Consequently, of the 30 dB that would be lost at the air-fluid boundary, the condensation effect
makes up for roughly 25 dB.
A small amount of additional amplification results from the lever action of the ossicular
chain. The basic idea is that the ossicular chain is suspended by ligaments in such a way as to
Figure 4-5. The mechanical lever advantage of the ossicular
chain. Adapted from Denes and Pinson, The Speech Chain,
1993, W.H. Freeman & Co.
form a lever system, with the fulcrum on
the body of the incus. One arm of the
lever system consists of the malleus
while the other arm consists of the incus
(see Figure 4-5). The malleus lever arm
is approximately 30% longer than the
incus lever arm, producing a lever ratio
of 1.3:1. Since the force amplification
that occurs in any lever system is
proportional to the ratio of the lengths of
the two lever arms, force will be
amplified by a factor of 1.3. Pressure is
the force per unit area, so this increase in
force means that pressure will also be
amplified by a factor of 1.3. Represented
on a decibel scale, this amounts to:
= 20 log10 (1.3)
= 20 (0.11)
= 2.3 dB
(Notice that the pressure version of the decibel formula is being used here rather than the
intensity version. That is because the lever advantage produces an increase in force and,
therefore, pressure.) If this 2.3 dB pressure amplification is added to the 25.4 dB that is produced
by the condensation effect, we find that the combined action of the middle ear system results in a
pressure amplification of 25.4+2.3 = 27.7 dB, nearly all of the 30 dB that would otherwise be
lost at the air-fluid boundary.
The Sensorineural Mechanism
The two major auditory structures of the sensorineural mechanism are the cochlea and the
auditory nerve. The cochlea is one portion of a larger structure called the labyrinth. As noted
earlier, the labyrinth contains both the cochlea (the organ of hearing) and the vestibular system
(the organ of balance). The three major divisions of the labyrinth are shown in Figure 4-6. The
snail-shaped portion of the labyrinth is the cochlea, which contains the hair cells and many other
structures that are important for hearing. The upper portion of the labyrinth contains three
structures called the semicircular canals, which are part of the vestibular system. The middle
portion of the labyrinth is called the vestibule. The oval window and round window are openings
into the vestibule.
The portion of the labyrinth that is shown in panel a of Figure 4-7 is a hollowed-out and
fluid-filled bony shell called the bony or osseous labyrinth. Fully contained within the bony
labyrinth is a fluid-filled structure called the membranous labyrinth, which can be thought of
as something like a convoluted water balloon that fits inside the bony labyrinth (see panel b of
Figure 4-7). The fluid that courses through the membranous labyrinth is called endolymph and
the fluid outside the membranous labyrinth is called perilymph. Two bulges in the membranous
Figure 4-6. The labyrinth. From Zemlin (1968).
Figure 4-7. The bony labyrinth (panel a) and the
membranous labyrinth (the unshaded portion of panel b).
Reprinted from Minifie, Hixon, and Williams (1973).
labyrinth called the utricle and saccule are part of the vestibular system. The portion of the
membranous labyrinth that is contained within the cochlea is called the cochlear duct. The end
of the cochlea that is closest to the vestibule is called the base or basal end, and the end that is
furthest from the vestibule is called the apex or apical end. The cochlea is divided into three
canals or scalae: the scala vestibuli, which lies above the cochlear duct, the scala tympani,
which lies below the cochlear duct, and the cochlear duct itself, which is also known as the scala
media. The three canals are shown in highly schematic form in a partially unrolled cochlea in
Figure 4-8. A small gap at the apical end of the cochlea called the helicotrema allows the
perilymph in the scala vestibuli and the scala tympani to communicate.
Anatomy of the Cochlea
Some of the views that are shown of the cochlea can be a bit difficult to interpret simply
because of the coiled shape. Since the coiling is strictly a space-saving feature that has
essentially no effect on cochlear physiology, the cochlea is often shown in unrolled form. Panels
a and b of Figure 4-9 show views that result from two kinds of cuts through an unrolled cochlea.
Panel c shows a highly schematic picture of what the view would look like if a cut were made
through the cochlea in its coiled-up form.
A more detailed picture of the view in panel c can be seen in Figure 4-10. Shown in this
figure are the basal, medial, and apical turns of the cochlea, which are wrapped around a bony
core called the modiolus. We can imagine building a structure similar to the cochlear portion of
the labyrinth by coiling a length of garden hose approximately 2 3/4 turns around wet plaster,
and then allowing the plaster to dry. The plaster is analogous to the modiolus, and the garden
hose is the cochlea. Entering through a tunnel in the modiolus is the cochlear branch of the
auditory nerve. The vestibular branch of the 8th cranial nerve, which is not shown in this figure,
Figure 4-8. Schematic of a partially unrolled cochlea
showing the scala vestibuli, the scala media, and the scala
tympani. Adapted from Zemlin (1968).
Figure 4-10. The cochlea, modiolus, and auditory nerve.
Reprinted from Deutsch and Richards (1979).
Figure 4-9 Cuts through an unrolled (panels a and b) and
rolled (panel c) cochlea. Reprinted from Deutsch and
Figure 4-11. A cross-section of the cochlea. Reprinted from
innervates sensory receptors in the vestibular system. Figure 4-11 shows a detailed view of a
single cross-section through the cochlea, corresponding to the cut shown in panel b of Figure 49. Fibers from the auditory nerve enter the cochlear duct through a tunnel in a thin shelf of bone
called the spiral lamina. The opening in the spiral lamina through which the auditory nerve
fibers enter is called the habenula perforata. The collection of 8th nerve cell bodies in the
modiolus is called the spiral ganglion (Figure 4-11). The spiral lamina is covered with a layer of
fibrous tissue called the limbus. As Figure 4-10 shows, the cochlear duct is a triangular-shaped
partition of the cochlea that is formed by two membranes: Reissner's membrane, which
separates the cochlear duct from the scala vestibuli, and the basilar membrane, which separates
the cochlear duct from the scala tympani. The basilar membrane is held in place by the spiral
ligament. Covering the spiral ligament in the scala media is a layer of endolymph-secreting
vascular tissue called the stria vascularis.
The set of structures that are resting on the basilar membrane are referred to collectively as
the organ of Corti. A more detailed look at the organ of Corti is shown in Figure 4-12.
Emerging from the limbus and lying immediately above the hair cells is a gelatinous membrane
called the tectorial membrane. The hair cells are arranged in rows consisting of a single inner
hair cell (IHC) and either three or four outer hair cells (OHC), with three OHCs being more
common. The hair-like structures emerging from the tops of the hair cells are called cilia. The
structure and function of hair cells will be discussed later in this chapter. The human cochlea
contains approximately 3,000 - 3,500 arrangements such as those shown in Figure 4-12,
consisting of approximately 3,000 - 3,500 IHCs and approximately 10,000 - 12,000 OHCs. In
later discussions we will refer to this unit consisting of one IHC and three or four OHCs as a
channel. The hair cells are innervated by approximately 30,000 auditory nerve fibers (Spoendlin,
1989), which connect to the base of the hair cells. The overwhelming majority (~98%) of these
auditory nerve fibers are afferent, i.e., conveying neural impulses away from the hair cells in the
direction of the central nervous system. In turn, the overwhelming majority (~95%) of these
afferent fibers are connected to the IHCs as opposed to the OHCs, meaning that it is almost
exclusively the IHCs that are responsible for conveying sensory information to the central
nervous system (Spoendlin, 1974). On average, there are approximately 10 auditory nerve fibers
connected to each IHC. Individual auditory nerve fibers that innervate IHCs typically supply a
single IHC, rather than sprouting branches to many other IHCs. Exactly the opposite is true of
fibers that innervate OHCs, where a single nerve fiber branches to innervate many OHCs. These
differences in innervation patterns are shown in Figure 4-13.
As will be seen below, the hair cells generate an electrical signal in response to the
traveling wave motion of the basilar membrane. These electrical disturbances in turn cause
depolarization of the auditory nerve fibers that are attached at the base of the hair cells. The
physiology of nerve fibers and the precise meaning of the term depolarization will be explained
later in this chapter. For the time being, it is necessary only to understand that the ultimate result
of the action of the organ of Corti is the generation of an electrical spike on the auditory nerve
fibers that innervate the hair cells. The nature of the basilar membrane traveling wave and the
mechanisms that are thought to be involved in the generation of the electrical disturbance in the
hair cells will be described below.
Figure 4-12. A cross-section of the cochlea. Reprinted from
Figure 4-14. The cochlear duct is formed by two membranes:
Reissner's membrane above, and the basilar membrane below.
In this simplified drawing the duct is represented as a single
structure called the cochlear partition. Inward movement of
the stapes causes a downward deflection of the cochlear
partition, and the fluid pressure is resolved by an outward
deformation at the round window. Outward motion of the stapes
has the opposite effect: the partition is deflected upward, and
the fluid pressure is resolved by an inward deformation at the
round window. Based on von Bekesy (1960) and reprinted from
Durrant and Lovrinic (1984).
Figure 4-13. Innervation patterns for inner and outer hair cells.
Note that a single IHC is typically innervated by many nerve
fibers, while individual nerve fibers innervating OHCs typically
branch to supply several receptor cells. After Spoendlin (1979).
[check this citation]
Figure 4-15. The basilar membrane varies continuously in
stiffness from base to apex. The greater stiffness of the
membrane at the base makes the basal end respond better to
high frequencies than low frequencies, while the opposite is true
of the apical end. After von Bekesy (1960), Rhode (1973), and
Durrant & Lovrinic (1984).
The Traveling Wave
Figure 4-14 shows a simplified view of an uncoiled cochlea. The cochlear duct in this
figure has been greatly simplified and is represented as though it were a single membrane,
attached on either side and running from the base to the helicotrema. We know, of course, that
the cochlear duct is formed by two membranes: Reissner's membrane above and the basilar
membrane below. These two membranes are often referred to collectively as the cochlear
partition, and for the purposes of understanding the movement dynamics of this system we can
consider the partition as consisting of just a single membrane. Further it is primarily the
mechanical properties of the basilar membrane that control the most important movement
characteristics of the cochlear partition. The single most important fact about the basilar
membrane is that its stiffness varies systematically from the base to the apex (see Figure 4-15).
Specifically, the basilar membrane is stiffer at the base than the apex. Recall from our discussion
of spring-and-mass systems in chapter 3 that natural vibrating frequency varies in direct
proportion to stiffness; i.e., as stiffness increases, natural vibrating frequency increases. This
means that the basal end of the basilar membrane, which is stiff, will respond best to high
frequencies, and the apical end, which less stiff, will respond best to low frequencies.
Figure 4-16. The basilar membrane traveling wave. Panel a shows a sequence of snapshots of the traveling
wave (reprinted from Ranke, 1942). Panel b shows a single snapshot of the traveling wave (reprinted from
When an acoustic stimulus is delivered to the ear, the vibratory pattern is picked up at the
tympanic membrane and transmitted through the ossicles, resulting in inward and outward
movements at the stapes footplate. As shown in Figure 4-14, the inward pressure of the stapes on
the incompressible cochlear fluid causes the cochlear partition to deflect in the direction of the
scala tympani. This fluid pressure is resolved by an outward deflection of the internal tympanic
membrane, which covers the round window. Similarly, during a rarefaction phase the outward
motion of the stapes will cause the cochlear partition to deflect in the direction of the scala
vestibuli, pulling the internal tympanic membrane inward. These upward and downward
deflections of the cochlear partition result in the generation of a displacement pattern with a
highly specific shape called a traveling wave. The basilar membrane traveling wave was
discovered in a series of ingenious experiments by Georg von Bekesy, which earned him the
Nobel Prize for Physiology and Medicine. As shown in Figure 4-16, the traveling wave moves
from the base toward the apex, with the amplitude rising rather gradually to a peak, and then
decaying rather suddenly after reaching a peak. Panel a shows what a sequence of snapshots of
the traveling wave might look like. The smooth curve in panel a is the envelope or amplitude
envelope of the traveling wave. A more detailed view of a single snapshot of the traveling wave
pattern is shown in panel b of Figure 4-16. As shown in Figure 4-15, the point along the basilar
membrane where the traveling wave reaches its maximum amplitude will be strongly affected by
the frequency of the input signal: high frequency signals will reach peak amplitude near the
basal end of the cochlea, where the basilar membrane is stiffer, while low frequency signals will
reach peak amplitude near the apical end of the cochlea, where the basilar membrane is less
stiff. What this means is that low-frequency signals will be directed toward the apical end of the
basilar membrane, high- frequency signals will be directed toward the basal end of the basilar
membrane, and mid-frequency signals will be directed toward an appropriate place in the middle
of the basilar membrane. As will be seen below, this frequency-dependent behavior of the basilar
membrane traveling wave will be reflected in the pattern of 8th-nerve electrical activity; that is,
for low-frequency signals, 8th-nerve electrical activity will be greatest for fibers connected to
hair cells at the apical end of the cochlea, while for high-frequency signals, 8th-nerve electrical
activity will be greatest for fibers connected to hair cells at the basal end of the cochlea. This
relationship between the frequency of the input signal and the point of maximum basilar
membrane motion is one of the most important properties of cochlear analysis, and it is the key
to understanding what has been called the place theory or the rate-place model of auditory
spectrum analysis, which will be discussed later in this chapter.
Figure 4-17. The upward and downward movement of the basilar membrane produces a shearing force on
the hair cell cilia, causing them to pivot at their base. Reprinted from Ryan and Davis (1976).
There is one additional fact about basilar membrane motion that is necessary for
understanding how the sensorineural system converts vibration into neural impulses. As shown
in Figure 4-17, the upward and downward movement of the basilar membrane produces
something called a shearing force on the hair cell cilia that results in the side-to-side movement
of the cilia. In other words, as the basilar membrane vibrates up and down, the cilia are
alternately forced away from and then toward the modiolus.2 As explained below, it is this
movement of the cilia that produces excitation of the hair cells which, in turn, results in the
depolarization of the auditory nerve fibers that innervate the hair cells.
All sensory receptors are examples of a general class of device called transducers. In all
cases the function of a transducer is to convert energy of one form into energy of a different
form. Common examples of transducers include microphones, which convert acoustic energy
Figure 4-19. Transmission electron micrograph of a longitudinal
section of outer hair cells. Reprinted from Kimura (1966).
Figure 4-18. A hair cell. Note that the cilia are arranged
according to height and interconnected by fine filaments
called transduction links. Adapted from Hudspeth (1994).
The mechanism of cilia motion that is shown in Figure 4-17 is the most widely accepted view. As simple and intuitive as this model might
seem, it may well be incorrect. The motion pattern shown in Figure 4-17 will have to suffice for the relatively cursory review presented here. For
our purposes the important point is that movement of the hair cell cilia occurs as a direct result of the movement of the basilar membrane. This
much is well established, although the detailed mechanism is not well understood. See Gelfand (1990) and Zwislocki (1984, 1985, 1986) for a
discussion of the potential problems with the view presented in Figure 4-17.
into electrical energy, and loudspeakers, which perform precisely the opposite type of
transduction, converting the electrical energy coming from an amplifier into acoustic energy. In
the case of sensory receptors, the job is to convert stimulation of various sorts into an electrochemical code consisting of a sequence of neural impulses. In the visual system the incoming
optical stimulus is converted by the rods and cones of the retina into a series of neural impulses
on the optic nerve that are interpreted by the brain as a visual image. In the auditory system, the
type of transduction that takes place involves the conversion of the mechanical vibration that
reaches the basilar membrane into a series of impulses on the auditory nerve.
While there are many aspects of auditory transduction that remain poorly understood, there
is complete agreement that the site of transduction is the hair cell, which generates an electrical
potential that stimulates impulses in the 8th-nerve fibers that are connected to its base. The chain
of events, which will be described below, includes the following: (1) the vibration of the basilar
membrane causes the hair-cell cilia to bend at their base, (2) this "shearing" of the cilia results in
the flow of electrical current through the hair cell that is called the receptor current, (3) the
receptor current stimulates the release by the hair cell of neurotransmitter chemicals, and (4)
uptake of this neurotransmitter substance by dendrites in an 8th fiber connected to the base of the
hair cell stimulates an all-or-none action potential in the nerve fiber.
A schematic drawing of a single hair cell is shown in Figure 4-18. In close proximity to the
base of the hair cell are auditory nerve fibers.3 The cilia projecting from the top of the hair cell
serve a crucial function in transduction. Note that the cilia are arranged according to height, with
the shortest cilia being closest to the spiral lamina. This feature can be seen clearly in the
electron micrograph shown in Figure 4-19. Also shown in the schematic drawing in Figure 4-18
is a series of very thin filaments called transduction links that serve to attach the adjacent cilia
of differing heights. The cilia themselves are exceedingly stiff and are effectively hinged at their
base. As a result, the application of a force to the hair bundle causes the cilia to pivot at their
base rather than bow. Figure 4-21 shows the position of the cilia at rest (panel a), and after the
application of a force either in the direction of the tall cilia (panel b), or in the direction of the
short cilia (panel c). For reasons that are explained below, it is the cilia motion pattern shown in
panel b, in response to the movement of the basilar membrane, which is ultimately responsible
for the stimulation of the 8th nerve fiber.
To see how the transduction process occurs it is necessary to understand two important
electrical potentials that exist within the cochlea. An electrode inserted into the body of the hair
cell will record a negative voltage of approximately -60 millivolts (mV), while an electrode
inserted into the endolymphatic fluid in the scala media (which is electrically separated from the
hair cell body) will record a positive voltage of approximately +80 mV. Consequently, the
difference in electrical potential between the hair cell body and the endolymphatic fluid that lies
outside of the cell body is approximately 140 mV. As will be seen below, this difference in
electrical potential serves as a biological battery that supplies the energy source for the
generation of the receptor current.
The single afferent fiber and single efferent fiber that are shown in Figure 4-18 should not be taken too seriously. Recall that: (a) the typical
IHC is innervated by several afferent fibers, (b) the great majority of afferent fibers innervate IHCs rather than OHCs, and (c) only about 2% of
all fibers innervating hair cells are efferent.
Figure 4-20. A simple electrical circuit consisting of a
battery in series with a variable resistor and a current meter.
Figure 4-22. Davis' model of hair cell function. Reprinted
from Davis (1965).
Figure 4-21. Hair cell cilia: (a) at rest, (b) being displaced in the
direction of the tall hairs (i.e., away from the modiolus), and (c)
being displaced in the direction of the short hairs (i.e., toward the
Figure 4-23. Model of hair cell transduction proposed by
Pickles (1984) and Hudspeth (1985). Adapted from
The theory of hair cell function that has enjoyed the widest acceptance is a surprisingly
straightforward model (at least in broad strokes) that was proposed a number of years ago by
Davis (1965), although many important details of hair cell transduction have only been
uncovered within the last few years. To understand how the Davis model works, consider the
simple electrical circuit in Figure 4-20. The circuit consists of a battery connected in series with
a device called a variable resistor (also known as a rheostat). The meter has been placed in the
circuit simply to record how much electrical current is flowing. A variable resistor is simply an
electrical resistor whose resistance value can be varied. Volume control dials on devices such as
televisions and radios are variable resistors, as are the dimmer dials that are often found in dining
rooms. Turning the volume down is a matter of setting the dial on the variable resistor to a high
resistance position, limiting the flow of electrical current to a small value. In our simple circuit,
this high resistance value would be reflected by a very small deflection on the current meter. On
the other hand, turning the volume up involves setting the dial on the variable resistor to a low
resistance position, resulting in a large flow of electrical current and, consequently, the current
meter would show a large deflection. The battery in this simple circuit corresponds to the
roughly 140 mV difference in electrical potential between the hair cell body and the endolymph.
According to Davis, the hair cell cilia behave like variable resistors whose resistance values
change as they pivot at their base. These changes in electrical resistance modulate the flow of
ions between the endolymphatic fluid and the hair cell. (An ion is an atom with either a surplus
of electrons, giving it a negative charge, or a deficit of electrons, giving it a positive charge.) A
drop in resistance is accompanied by the flow of electrical current, and this current flow is the
It is now known that the specific type of cilia motion that produces the required resistance
drop is movement in the direction of the taller cilia; that is, the kind of motion that is depicted in
panel b of Figure 4-20. Electrical resistance offered by the cilia when they are standing straight
up (Figure 4-20, panel a) is very high and becomes even higher when the cilia are sheared in the
direction of the shorter cilia (Figure 4-20 panel c), resulting in inhibition of the receptor current
rather than excitation. A more complex version of the electrical circuit envisioned by Davis is
shown in Figure 4-22.
A theory proposed by Pickles et al. (1984) and Hudspeth (1985) attempts to explain why
this change in electrical resistance occurs when the hair bundle pivots in the direction of the
taller cilia. A schematic of the model is shown in Figure 4-23. Recall that very fine filaments
called transduction links connect adjacent cilia of different heights. According to this "gatespring" model, movement of the hair bundle in the direction of the taller cilia has the effect of
stretching these transduction links, while movement of the hair bundle in the direction of the
shorter cilia has the effect of compressing the transduction links (see Figure 4-23). As seen in the
figure, stretching the spring-like transduction links has the effect of opening a pore or "molecular
gate," allowing ions to flow. On the other hand, movement in the direction of the shorter cilia,
which compresses transduction links, has the effect of squeezing the molecular gate closed,
inhibiting the flow of ions. As a result, the receptor current tends to be generated primarily
during that half of each cycle of vibration that results in movement of the hair bundle in the
direction of the taller cilia.
Figure 4-24. Idealized relationship between the receptor potential generated by the hair cells and the input
signal. The receptor potential is a graded response that preserves the shape of the input signal, except that
it is half-wave rectified, meaning that the receptor current increases only when the hair bundle shears
away from the modiolous; i.e. shorter hairs shearing in the direction of taller hairs.
To complete the transduction story, it is necessary to note that the receptor current (i.e., the
electrical current that flows through the hair cell) stimulates the release by the hair cell of
neurotransmitter chemicals which, in turn, stimulates the depolarization of the auditory nerve
fiber. Although the causal link between these two events is well established, the precise
mechanism that relates the generation of the receptor to the release of neurotransmitter substance
is currently not well understood.
A crucial fact about the nerve-stimulating electrical disturbance that is generated by the
hair cell is that it is a graded signal. This means that the instantaneous amplitude of the hair cell
current varies continuously depending on the instantaneous amplitude of the shearing force that
is applied to the hair bundle (which, in turn, varies continuously depending on the amplitude of
the basilar membrane traveling wave). To say that the receptor current varies continuously with
the amplitude of the shearing force simply means that when the shearing force is low the receptor
current will be low, when the shearing force is large the receptor current will be high, and when
the shearing force is intermediate in size the receptor current will be intermediate. Figure 4-24 is
an idealized representation showing how the receptor current varies over time for two input
signals. The main point to be made about this figure is that the changes over time in receptor
current faithfully model the shape of the input signal, with one critical exception: since the hair
cell is stimulated to generate a receptor by shearing of the cilia in one direction only, the "bottom
half" of the signal is missing. The name that is given to this process in which only one polarity of
a signal is preserved is half-wave rectification. The main point, then, is that the hair cell
receptor faithfully models a half-wave rectified version of the input signal (though with some
restrictions, which will not be discussed here).
The graded nature of the receptor stands in contrast to the all-or-none nature of the
electrical potential that is generated by the auditory nerve fiber. As will be seen, this graded,
continuously varying receptor current will be translated by the auditory nerve into a sequence of
discrete on-off pulses called action potentials. The mechanism involved in the generation of
these action potentials on the auditory nerve will be described below.
Figure 4-25. The dendrites of one neuron synapsing with the axon of an adjacent neuron . Reprinted from
Denes and Pinson, The Speech Chain, 1993, W.H. Freeman & Co.
Neurons are highly specialized cells that form the basic building blocks of the nervous
system. The human brain contains approximately 10 billion neurons. Neurons can vary
considerably in their detailed structure, but all neurons share a common architecture, which is
illustrated in Figure 4-25. The portion of the neuron containing the nucleus is called the cell
body. The long projection extending from the cell body, called the axon, carries electrochemical
information away from the cell body. Axons terminate in branch-like endings called nerve
endings. Axon lengths can vary considerably from one neuron to the next, with the longest
axons extending a meter or more. The bushy endings on the other side of the cell body are called
dendrites; these extensions convey electrochemical information in the direction of the cell body.
The microscopic spaces that exist between the nerve endings of an axon and the dendrites of an
adjacent neuron are called synapses.
Effector Cells and Receptor Cells
Neurons communicate not only with other neurons, but also with two other kinds of
specialized cells: effector cells and receptor cells. A common example of an effector cell is a
muscle fiber, which receives its stimulus to contract from a neuron. Receptor cells, on the other
hand, receive sensory information from stimuli such as light, sound, and touch, and convey this
information to adjacent sensory neurons. In the auditory system, hair cells serve as receptor cells,
and the neurons that convey information from the hair cells to the central nervous system are
auditory nerve fibers.
Generation of an Action Potential
The electrochemical signal that is generated by a neuron is called an action potential. The
energy source that supplies the power for the generation of an action potential is a difference in
electrical charge between the cytoplasm inside the neuron and the extracellular fluid that lies
outside of the cell membrane. If one electrode is placed inside the cell membrane of a neuron in
its resting state and a second electrode is placed in the extracellular fluid just outside of the cell
membrane, a voltmeter will show an electrical potential of about -50 mV, with the cell body
being negative with respect to the extracellular fluid (see Figure 4-26, panel a). The difference in
charge exists because of unequal concentrations of positively and negatively charged ions within
the cell as compared to the extracellular fluid. The neuron in this state is said to be polarized,
and the difference in charge can be thought of as constituting a biological battery in the same
sense as the electrical potential that serves as the energy source for the generation of the receptor
potential in the Davis hair cell model.
Figure 4-26. Propagation of an action potential along a nerve fiber.
There are several different kinds of events that can stimulate the depolarization of a neuron,
resulting in the propagation of an action potential along the axon. The most important of these
events is the uptake of neurotransmitter chemicals by the dendrites of the neuron. These
neurotransmitter chemicals are released either by an adjacent neuron or an adjacent receptor cell,
such as a hair cell. Depolarization of the neuron begins with an increase in the permeability of
the cell membrane, allowing positively charged ions to enter the cell and negatively charged ions
to exit. The result is a very rapid change in the electrical potential of the cell. Panel b of Figure 426 shows the electrical state of a neuron at a particular location where the depolarization
disturbance has reached.
An imperfect but useful analogy can be drawn between the propagation of an action
potential on an axon and the combustion that propagates along the length of a fuse. The analogy
is useful in making three important points about the propagation of an action potential. First, the
event is a self-sustaining chain reaction. In the case of a fuse, the combustion in one local area of
the fuse causes an adjacent area to combust, and in the case of the action potential, it is the local
electrochemical disturbance that spreads to adjoining regions of the axon. Second, the energy
that supports the propagation of the action potential comes from the fiber itself, and not the
stimulating event, just as the energy that is responsible for the propagation of combustion along a
fuse comes from the fuse and not the match that was used to light the fuse. Finally, combustion
along a fuse is an all-or-none event, meaning that the fuse will either burn or fail to burn, and the
amount of heat that is generated along the fuse will not be graded depending on the size of the
match that was used to light it. This all-or-none law is one of the most fundamental and
important properties of neural coding: the neuron either depolarizes or it does not, and the
amplitude of the action potential is not graded according to the amplitude of the stimulating
event. In relation to the transduction process, this means that the graded receptor potential
generated by the hair cell will be translated not into a correspondingly graded neural event, but
into a discrete, all-or-none action potential on the auditory nerve.
Figure 4-27. Changes in voltage over time in an action potential.
Figure 4-27 shows what a typical action potential looks like. Action potentials are measured
by placing a very small recording electrode inside the membrane of an axon. Consequently, the
graph shows the changes that occur over time in the electrical potential within the cell at one
particular location on the axon. The same pattern is repeated at different points in time as the
disturbance propagates along the axon. The graph begins at the approximately -50 mV resting
potential of the neuron. The very rapid swing to about +40 mV occurs when the cell membrane
permeability increases, allowing positive ions to enter. This rapid swing from about -50 mV to
about +40 mV occurs over a brief interval of approximately 0.5 ms, and this portion of the action
potential is called the spike potential. An active process within the cell rapidly repolarizes the
neuron by pumping positive ions out through the cell membrane and, if the neuron remains
undisturbed, the electrical potential eventually returns to the resting potential of about -50 mV.
Absolute and Relative Refractory Periods
There is one crucial aspect of the fuse analogy that does not apply to neurons: once a fuse
burns it cannot be relit. Neurons, on the other hand, repolarize shortly after generating an action
potential and can be stimulated to fire again. However, if the stimulating event occurs less than
about 1 ms after the generation of a spike potential, the fiber will not fire, no matter how strong
the stimulus. This interval of approximately 1 ms is called the absolute refractory period, and it
simply means that action potentials cannot occur more frequently than about once every
millisecond. This corresponds to a frequency of 1/1,000 s = 1,000 spikes per second, or 1,000
Hz. Following the absolute refractory interval is a longer interval of about 1-10 ms that is called
the relative refractory period. A neuron can fire during the relative refractory period, but the
threshold for stimulating the neuron is elevated. For example, at 2 ms following a neural spike
the neuron is capable of firing again since the absolute refractory period has been exceeded;
however, since the threshold is elevated a relatively strong stimulus is required. It is important to
appreciate that the firing of neurons is a probabilistic event, meaning that it has a random rather
than deterministic character. The probability of a second firing increases with either: (a)
increases in the amplitude of the stimulating event (i.e., the neuron is more likely to fire when
strongly stimulated), and (b) increases in the time that elapses since the previous spike potential
(i.e., the neuron is more likely to fire if a long interval has elapsed since the previous spike
potential). As we will see later in this chapter, the concepts of refractory periods and the
probabilistic nature of neural firing patterns have important implications for neural coding of
Excitation versus Inhibition
When a neural spike arrives at a nerve ending, neurotransmitters are released into the
synaptic space where they are taken up by the dendrites of adjoining neurons. To this point we
have been speaking as though the release of neurotransmitters at synaptic junctures always had
the effect of stimulating an action potential. However, synaptic junctures may be either
excitatory – increasing the likelihood of an action potential in an adjoining neuron, or
inhibitory –- decreasing the likelihood of an action potential in an adjoining neuron. These
inhibitory connections are quite important and play a central role in a class of contrast-enhancing
phenomena called lateral suppression or lateral inhibition, which we will not be discussing in
this text. [Omit this? if not, at least give a citation]
Signal Coding on the Auditory Nerve
Sensory Nerves as Encoders
We have seen that the receptor potential that is generated by the hair cells is a graded or
continuously varying signal that is a faithful model of the input signal, with the important
exception that it is half-wave rectified; that is, the "bottom half" of the signal is not represented
since the hair cells are excited by cilia shearing in one direction only (see Figure 4-24). In fact,
the signal exists in graded or continuous form at all points up to and including the hair cell
potential: (a) as continuous variations in instantaneous air pressure over time prior to the
tympanic membrane, (b) as continuous variations in instantaneous displacement over time all the
way from the tympanic membrane to the basilar membrane, and (c) as continuous variations in
instantaneous voltage over time at the hair cell. The electrical signals on the auditory nerve,
however, have a very different character since auditory nerve fibers carry a sequence of all-ornone on-off pulses. [a figure is needed here - see parkins-houde paper]
The importance of this difference between the graded receptor potential and the discrete
on-off pulses on the auditory nerve can not be overestimated. What this means is that the
auditory nerve cannot represent the input signal in a completely straightforward way; for
example, the all-or-none law means that the auditory nerve cannot simply generate weak pulses
when the receptor potential is weak and strong pulses when the receptor potential is strong. The
auditory nerve must find some way to encode the receptor potentials generated at various places
along the basilar membrane that can be carried out with on-off pulses.
The key word here is encode, and to understand the nature of this encoding process and, in
fact, the basic structure of all encoding operations, it might be helpful to consider the kind of
encoding that occurs in the transmission of messages using Morse Code. In Morse Code the units
that must be encoded are letters and a few control characters such as STOP. This is accomplished
by assigning a code to each character consisting of a unique sequence of long and short electrical
pulses. Imagine a device consisting of an optical scanner, software that would recognize the
characters on a page of text, and an encoding circuit that would produce the appropriate sequence
of long and short electrical pulses for each character. The main point is that the device has done
more than simply convert from optical energy to electrical energy; it has translated or encoded
the message into an entirely different kind of language; that is, from the language of letter shapes
to the language of pulse widths.
In the case of the auditory system, the "message" that needs to be encoded is the sound
wave arriving at the tympanic membrane or, alternatively, its spectrum. The signal is preserved
in half-wave rectified form in the hair cell receptor potential that drives the auditory nerve. The
kind of translation that is occurring is from a graded, continuous signal to a sequence of on-off
pulses. The question, then, is how might this continuous signal (or its spectrum) be coded on the
Figure 4-28. The basilar membrane displacement patterns for two sinusoids differing in frequency
(top panels) and the auditory nerve firing rate patterns that would likely be associated with each of
these signals (bottom panels). When the amplitude of basilar membrane movement is high,
auditory nerve firing rate is high.
auditory nerve using on-off pulses? Since the pulses do not vary appreciably in amplitude, the
number of dimensions that might be exploited is fairly limited. Three characteristics of auditory
nerve firing patterns that might be exploited in this coding scheme are: (1) the time of occurrence
of the pulse, (2) the rate at which the neurons fire (i.e., whether a large or small number of
spikes occur in a given time interval), and (3) the physical location of the nerve fiber (i.e.,
whether the nerve fiber is connected to a hair cell on the basal end of the cochlea, the apical end,
or somewhere in between). These dimensions need not be treated separately. For example, it is
possible to examine the rate of neural activity for fibers connected at various positions along the
basilar membrane, which combines the firing rate parameter with the physical position
parameter. This is the essence of "place coding" or "rate-place coding," described below.
Rate-Place or Tonotopic Coding
To understand rate-place coding, it is necessary to recall that the basal end of the basilar
membrane, which is stiffer than the apical end, responds better to high frequencies than low
frequencies, while the opposite is true of the apical end of the basilar membrane. Consequently,
higher frequency pure tones will produce the largest basilar membrane movement amplitude
toward the base, while lower frequency pure tones will produce the largest basilar membrane
movement amplitude toward the apex (see Figures 4-15 and 4-16). The same basic principle
applies to complex signals consisting of many frequency components: the lower frequency
components of the input signal will be directed toward the apical end of the basilar membrane,
and the higher frequency components will be directed toward the basal end. The basic idea of
rate-place coding is that this spatial separation of frequency components will be reflected in the
pattern of auditory nerve activity. As shown in Figure 4-28, two signals differing in frequency
will show different patterns of 8th nerve electrical activity, with lower frequency signals showing
more activity at the apical end and higher frequency signals showing more activity at the basal
end. "Amount of neural activity" here is simply firing rate: the number of spikes per unit time in
neurons connected at various places along the cochlea. The basic idea, then, is that auditory
nerve activity toward the base codes high frequency, while auditory nerve activity at the apex
codes low frequency. The representation in Figure 4-28 can be viewed as a spectrum of sorts,
broadly analogous to a Fourier amplitude spectrum, with two differences: (1) the frequency scale
is backwards, since low frequencies are on the right and high frequencies are on the left, and (2)
the spectrum is quite coarse relative to the kind of spectrum that can be obtained by Fourier
analysis; that is, the pure tone produces activity over a rather wide area of the cochlea. The first
point is not relevant since Mother Nature has no bias toward reading from left to right, but the
second point may have considerable relevance. This issue will be discussed below.
The data shown in Figure 4-28 are hypothetical, and no such pattern has ever been directly
observed. The reason is that collection of this kind of data would require the simultaneous
recording of auditory nerve firing patterns in a large number of fibers at various positions.
Current methods do not exist for making these kinds of recordings simultaneously from a large
number of spatially separated neurons. Rate-place coding, however, can be inferred from two
techniques that make use of recordings from single auditory nerve fibers. One technique involves
the measurement of neural tuning curves from single neurons, and the other technique involves
measurement of frequency response curves, also from single neurons.
Figure 4-29. Neural tuning curves for auditory nerve fibers with three different characteristic
frequencies (CF). Data from Kiang and Moxon (1974).
Figure 4-29. Neural tuning curves for auditory nerve fibers with three different characteristic
frequencies (CF). The threshold of the fiber is the lowest (i.e., sensitivity is greatest) at the
characteristic frequency of the fiber. Data from Kiang and Moxon (1974).
Neural Tuning Curves
Neural tuning curves are measured by placing an electrode into a neuron and determining
the threshold of the fiber over a wide range of signal frequencies. The threshold is simply the
signal intensity that is required to obtain a measurable response from the neuron. Neural tuning
curves for three different neurons are shown in Figure 4-29. Measuring the threshold of a neuron
is not simply a matter of increasing the signal intensity until the neuron fires. This is because
neurons will fire periodically even in the absence of an acoustic stimulus. The rate at which a
neuron will fire in relative quiet is called the spontaneous rate of the neuron. The threshold of a
neuron, then, is the intensity required for a neuron to fire at rates that are measurably above its
The main point to be noted about the tuning curves in Figure 4-29 is that each neuron has a
much lower threshold at some frequencies than others. The sharp dip in each tuning curve
represents the lowest threshold and therefore the frequency at which the neuron is most sensitive.
This is called the characteristic frequency (CF) or best frequency (BF) of the neuron. The
terms characteristic frequency and best frequency need to be interpreted carefully. Finding that a
given neuron has a CF of 12,000 Hz, for example, does not reveal anything about the structural
properties of the neuron that cause it to "resonate" at 12,000 Hz; rather, the CF of 12,000 Hz
Figure 4-30. Frequency response curves for two neurons with different characteristic frequencies.
The figure shows the firing rate of the two nerve fibers to sinusoids whose amplitude is always the
same, but whose frequency varies. Data from Rose et al. (1971).
means that the neuron is connected to a hair cell that is located at the high frequency (basal) end
of the basilar membrane. In other words, the "best frequency" of a neuron is determined not by
its internal properties but by its location along the basilar membrane. If a neuron has a CF of
12,000 Hz it is because it is innervating a hair cell that is located at a point along the length of
the cochlea where the basilar membrane responds best to a frequency of 12,000 Hz. If this
12,000 Hz CF neuron were "unplugged" from its hair cell near the basal end of the cochlea and
attached to a hair cell located at the apical end, the CF would shift to a lower frequency since it
would then be driven by the movement of a portion of basilar membrane that is maximally
sensitive to lower frequencies. Consequently, although CF is measured by recording the
electrical activity of a nerve fiber, the best frequency of the fiber is actually controlled by the
mechanical properties of the basilar membrane.
Notice also that the tuning curves are asymmetrical; that is, the slopes are much sharper on
the high frequency side than the low frequency side. This asymmetry is a direct result of the
asymmetry in the envelope of the basilar membrane traveling wave. This point will be addressed
in the section below on auditory nerve frequency response curves. The relationship that exists
on the auditory nerve between CF and the physical location of the nerve fiber along the basilar
membrane is called tonotopic organization. As will be seen later in this chapter, tonotopic
organization is a fundamental architectural property of the auditory system. Tonotopic
organization is preserved not only on the auditory nerve but throughout the entire auditory
system, up to and including the auditory cortex.
Frequency Response Curves of Auditory Nerve Fibers
Another way to observe rate-place coding on the auditory nerve is to measure frequency
response curves of individual fibers. For reasons that are explained below, this method is more
revealing in some respects of the kind of frequency analysis that is carried out by the cochlea.
The method is conceptually identical to the one described in Chapter 3 for measuring the
frequency response of a filter. The method described earlier involves driving the filter with pure
tones of constant amplitude at various frequencies, from the lowest frequency of interest to the
highest frequency of interest. The frequency response curve is the amplitude of the signal at the
output of the filter as a function of the frequency of a constant amplitude input signal. The shape
of the frequency response curve tells us what frequencies will be allowed to pass through the
filter and what frequencies will be attenuated.
To measure the frequency response curve of an auditory nerve fiber, a recording electrode
is placed in the nerve fiber and its firing rate is measured as pure tones are delivered at a variety
of input frequencies; the amplitude of the input signal is held constant. The frequency response
curve of the fiber shows firing rate on the y axis and the frequency of a constant amplitude input
signal on the x axis. The firing rate measure is equivalent to the output amplitude measure that
was described in Chapter 3 for determining the frequency response curve of a filter.
Figure 4-30 shows frequency response curves for two neurons with different CFs. The
measurements were made by Rose et al. (1971) from a squirrel monkey using a rather low
presentation level of 45 dBSPL. (The low presentation level is quite important, as will be
discussed below.) One fiber has a CF of 900 Hz and the other fiber has a CF of 1,700 Hz. Note
that the two frequency response curves resemble bandpass filters; that is, the fibers respond with
maximum output at their CF, with fairly sharp drops on either side. Again, the tendency of these
fibers to respond with high firing rates at their CF does not reveal anything about the fibers
except that the 1,700 Hz CF fiber innervates a hair cell that is closer to the basal end of the
basilar membrane than the 900 Hz CF fiber. The filtering effect, then, can be attributed to the
frequency selective behavior of the basilar membrane and not the nerve fiber or the hair cell.
Findings such as those presented in Figure 4-30 have given rise to a view of the cochlea as
a filter bank; that is, a bank of some 3,000-3,500 overlapping bandpass filters of the kind shown
in Figure 4-30. This range of 3,000-3,500 comes from the approximate number of hair cell
channels in the cochlea, with each channel consisting of 1 IHC and 3-4 OHCs.4 Each of these
channels can be thought of as analogous to the bandpass filter that is used on a radio tuner: each
channel allows a band of energy through, while attenuating signal components of higher or lower
frequency. By measuring the output of each of these channels (i.e., the firing rate), a spectrum
could be reconstructed. Since each channel is maximally sensitive to signal components at the
fiber CF, the firing rate at each CF reflects the amount of signal energy at that frequency. This is
the essence of what is meant by rate-place coding: the firing rate at each channel codes the
amount of signal energy at the CF corresponding to that channel.
In terms of the signals that are generated on the auditory nerve, a channel can essentially be considered to be a single IHC since the great
majority of afferent auditory nerve fibers innervate IHCs rather than OHCs.
Figure 4-31. Frequency response curves for a single auditory nerve fiber at eight different signal
intensities. Note that the frequency response curves are relatively narrow at low presentation levels
but become very broad at intensities that are typical of speech. Data from Rose et al. (1971).
Problems with Rate-Place Coding
There is no doubt that filtering of the general type that is shown in Figure 4-30 takes place
in the cochlea, but does the auditory system actually derive a spectrum using rate-place coding?
Opinions on this question have been divided for many years. One of the main questions is
whether the bandpass filters that make up the cochlear filter bank are sufficiently selective to
account for what is known about the frequency discrimination abilities of listeners. The term
"selective" here simply means narrow; that is, a selective or narrow band filter passes a narrow
band of frequencies, with sharp slopes on either side. If frequency discrimination ability can be
explained on the basis of the cochlear filter bank, then the bandpass filters need to be very
narrow since frequency discrimination abilities are stunningly good: in the middle portion of the
spectrum, one just noticeable difference in frequency corresponds to a distance along the basilar
membrane of approximately 10 microns, or roughly the diameter of a single inner hair cell
(Davis and Silverman, 1970). Consequently, in order for rate-place coding to work, the
bandwidths of the filters at each channel would have to be sufficiently narrow that relatively
little energy is allowed to "spill" into an adjacent channel.
There is some reason to believe that the cochlear filter bank is too broadly tuned to explain
frequency discrimination abilities. Although frequency response curves tend to be fairly narrow
when signal intensities are low, there is very good evidence that filter bandwidths become quite
broad at even moderate signal intensities. Figure 4-31 shows a family of frequency response
curves for an individual auditory nerve fiber from a squirrel monkey from a study by Rose et al.
(1971). This particular auditory nerve fiber has a CF of 1,700 Hz. The eight separate curves
represent the frequency response curve measured at eight different signal levels, every 10 dB
from 25 dBSPL to 95 dBSPL. Notice first of all that the frequency response curves reveal a certain
amount of frequency selectivity; that is, the fiber responds better to frequencies at or near the
1,700 Hz CF than at other frequencies, despite the fact that the intensity of the input signal is
held constant for each of the individual curves. However, notice that the degree of frequency
selectivity is strongly affected by signal level. Specifically, the frequency response curves
become much more broad (i.e., less frequency selective) at higher signal intensities. For
example, at 35-45 dBSPL the shapes of the frequency response curves resemble a bandpass filter,
with fairly sharp drops in firing rate on either side of the CF. However, at levels that are more
typical of speech (e.g., 65-85 dBSPL), the filter shapes become considerably broader. This is
especially true on the low frequency side of the frequency response curves; that is, the fibers
show considerable activity at frequencies that are much lower than the CF. In fact, at the higher
presentation levels the filter shapes begin to resemble lowpass filters more than bandpass filters.
What this means is that the relationship between place and frequency -- which is the essence of
place coding -- is not nearly as strong at typical speech levels as it is at very low stimulus levels.
Some theorists have argued that the frequency selectivity that is shown in these frequency
response curves is far too coarse to account for the excellent frequency discrimination ability of
The final point that needs to be discussed regarding the frequency response curves in
Figure 4-31 is the asymmetry. The slopes of the frequency response curves are considerably
sharper on the high frequency side (~100-500 dB/octave) than on the low frequency side (~8-12
dB/octave). This is the same kind of asymmetry that was seen earlier in the neural tuning curve
data, and both effects are due to the asymmetry in the envelope of the basilar membrane
traveling wave. This may seem counterintuitive since the traveling wave envelope has a sharper
slope on the low frequency (apical) side than the high frequency (basal) side. To understand why
this does, in fact, make sense, suppose that we were to measure the frequency response of a fiber
with a CF of 1,000 Hz. In measuring the frequency response curve of the fiber, we begin with
low frequency pure tones and move to higher frequencies, each time holding the intensity
constant and measuring the firing rate of the neuron. Since we are progressing from low
frequencies to high frequencies, the point of maximum amplitude in the basilar membrane
traveling wave moves systematically from the apex to the base. Figure 4-32 shows what the
traveling wave envelope would look like for three sinusoids: (1) a pure tone at the CF of the fiber
(1,000 Hz), (2) a pure tone that is lower in frequency than the CF, and (3) a pure tone that is
higher in frequency than the CF. The arrow in roughly the center of the figure shows the
approximate location of the 1,000 Hz CF fiber that is being recorded. The firing rate at this
location will be strongly correlated with the amplitude of the traveling wave at this position on
the cochlea. The main point to notice is that the pure tone that is lower than the CF would be
expected to cause much more activity in the 1,000 CF fiber than the pure tone that is higher in
frequency than the CF due to the asymmetry in the traveling wave envelope. This is why the
slopes of the frequency response curves in Figure 4-31 tend to be much sharper on the high
frequency side than the low frequency side.
Summary of Rate-Place Coding
In summary, the essence of rate-place coding is that the vibratory characteristics of the
basilar membrane are such that high frequency signals tend to cause greater neural activity in
auditory nerve fibers connected at the base than the apex, while the opposite is true of low
Figure 4-32. Basilar membrane traveling wave envelopes for signals at three different frequencies.
If a signal with a 1,000 Hz CF is being recorded, the lower frequency signal will produce more
activity in the 1,000 Hz CF fiber than the higher frequency signal. This is why tuning curves and
frequency response curves have sharper slopes on the high frequency side than the low frequency
frequency signals. This relationship between the characteristic frequency or best frequency of an
auditory nerve fiber and spatial location along the basilar membrane is called tonotopic
organization. Tonotopic organization may be observed experimentally by recording the threshold
of individual nerve fibers at different frequencies, resulting in a neural tuning curve. It can also
be observed by measuring the frequency response curves of individual nerve fibers.
Experimental findings using these two techniques have given rise to the view of the cochlea as a
filter bank, with each of some 3,000-3,500 channels passing a band of frequencies. An auditory
spectrum might be coded as variations in the firing rate at the output of each of these channels.
However, there is some uncertainty about whether the filter bank provides enough frequency
resolution to account for the excellent frequency discrimination that is shown by listeners.
The basic idea behind synchrony coding is that the period of the input signal will be
preserved in the period that elapses between successive spikes on the auditory nerve fibers. For
example, if the input signal has a period of 10 ms (f = 100 Hz), the pulse train produced on the
auditory nerve will tend to have an interspike interval of 10 ms. This type of coding, then,
exploits the time of occurrence parameter in neural firing patterns. The basic idea behind
synchrony coding is shown in a highly simplified form in Figure 4-33. The basic idea is quite
straightforward: a 100 Hz signal will produce a 100 Hz pulse train on the auditory nerve or,
stated differently, the interspike interval will match the period of the input signal. According to
synchrony coding, the auditory spectrum is assumed to be derived not by the filtering action of
the cochlea, but by the measurement of interspike intervals in the central nervous system, where
neural firing patterns are analyzed.
The very simple kind of synchrony coding that is shown in Figure 4-33 is a very old idea
that dates back to Rutherford (1886). When the theory was first proposed the limits imposed on
maximum firing rates by neural refractory periods were not known, and it was thought that an
Figure 4-34. The volley principle. Adapted from Wever (1949).
individual nerve fiber could fire at the very high rates that prevail at the high end of the human
frequency range (~20 kHz). Recall that the absolute refractory period is approximately 1 ms,
meaning that a neuron cannot fire at a rate exceeding about 1,000 Hz. An individual nerve fiber,
in fact, can sustain this 1,000 Hz top rate only for very brief bursts. For sustained signals, the
relative refractory period must also be taken into account. The relative refractory period is a
longer interval ranging from about 1-10 ms, and when this interval is taken into account, a
maximum sustained firing rate that might average perhaps 300-400 Hz or less might be achieved
by an individual fiber.
One implication of these firing rate limits would seem to be that the synchrony principle
could accurately code signal frequencies only for very low frequencies of perhaps 300 Hz or
lower. However, a rather simple elaboration of synchrony coding called the volley principle
allows this low frequency limit to be greatly exceeded. The basic idea behind the volley
principle, which was proposed by Wever (1949), is shown in Figure 4-34. The assumption made
by Wever is that, while each individual neuron may not be able to fire fast enough to produce
one pulse for every cycle of the waveform, the sum of the activity of several neurons will show a
train of pulses whose interspike interval matches the period of the input signal.5 According to
Figure 4-34, which has been widely reproduced in many texts in auditory physiology, does a rather good job of capturing the basic idea
behind Wever's volley principle. However, the figure is misleading in one respect. The figure shows an unrealistically orderly firing pattern for
the individual neurons, where each neuron fires, misses exactly four cycles, fires again, and so on. The adjacent neurons are also shown to be
Figure 4-35. An illustration of pulse coding. The figure shows the instantaneous probability of a pulse on
an individual auditory nerve fiber (i.e., the number of spikes occurring in a given time interval) as a
function of time. Superimposed on the pulse probability function, which is plotted as a bar graph, is the
input signal. The main lesson of this figure is that instantaneous pulse probability is high when
instantaneous signal amplitude is high, and vice versa. Consequently, the pulse probability function
preserves the shape of the input waveform, except that the function, like the receptor potential, is half-wave
rectified. Theis figure is a stylized version of data from the Rose et al. (1971) measurements of 8th nerve
firing patterns in the squirrel monkey. [replace w/ scan of Rose puretone and complex data??]
this view, then, the interspike interval, which codes the period of the input signal, must be
measured not from individual nerve fiber, but from the summed output of a group of nerve
This coding scheme does not depend on individual nerve fibers firing at the same
frequency as the input signal, but it does depend on the ability of nerve fibers to remain
synchronized to the input waveform. In other words, the neuron may fire for one cycle, miss
several cycles, fire on another cycle, miss several more, and so on. However, for the coding
scheme to work, the spikes that do occur need to remain in synchrony with the input waveform.
Remaining in synchrony means that the fiber tends to fire at roughly the same point in the cycle.
For example, a fiber out of synchrony might fire once at the positive peak, next at zero crossing,
again at a negative peak, and so on. Auditory nerve fibers appear to be able to maintain
synchrony for signal frequencies up to approximately 4,000-5,000 Hz, although there is no sharp
cutoff. Consequently, synchrony coding, along with the summing-across-fibers assumption
provided by the volley principle, appears to be capable of coding frequency up to about 4,0005,000 Hz.
offset from one another by exactly one pulse. Such orderly behavior is not actually obtained experimentally, but the basic principle of measuring
the interspike interval from a group of neurons does not depend on this unrealistically orderly behavior.
Rate-place coding and synchrony coding are both very old ideas with long and complex
histories. The earliest well formulated version of place theory was proposed by von Helmholtz
(1857) more than a century ago, and later significantly modified and extended in the NobelPrize-winning work of von Bekesy (1960) that culminated in the traveling wave theory that has
been discussed at great length in this chapter. Synchrony coding has a nearly equally long
history, dating back to the first well formulated version described by Rutherford (1886). For
many years the principal debate in auditory physiology centered around the question of which of
these two divergent approaches was correct. A view that has become rather common is that these
two coding schemes are not incompatible. A compromise view called place-synchrony coding
or place-volley theory holds that frequency is coded both by rate-place mechanisms and by
synchrony mechanisms. In the low frequencies, where synchrony is well maintained, synchrony
coding is thought to be dominant, while in the higher frequencies where synchrony is not well
maintained, place coding is dominant. There is assumed to be no abrupt shift between synchrony
coding and place coding, and for a fairly broad range of frequencies in the middle of the
spectrum, perhaps from 1,500-5,000 Hz, both kinds of coding schemes may contribute to
There is one additional coding scheme that may play an important role in auditory analysis.
The scheme is called pulse coding, and it typically receives little or no attention in introductory
discussions in auditory physiology and, in the view of the authors, has perhaps received too little
attention among professional scientists. The basic idea behind pulse coding is shown in Figure 435, which displays a slightly simplified version of data from a study by Rose et al. (1971). In this
study recordings were made from single auditory nerve fibers from a squirrel monkey. Each of
these figures shows pulse probability as a function of time; that is, the probability that a pulse
will occur at any given point in time. (Note that this scheme, like synchrony coding, exploits the
time of occurrence parameter of auditory nerve firing patterns.) Superimposed on each of these
figures is the signal that is being presented to the animal. The signal is the smooth curve, and
pulse probability is shown as a bar graph. Pulse probability is simply the number of spikes that
occur in a given small interval of time. The similarity between the shape of the pulse probability
function and the shape of the input signal is striking. What the Rose et al. data show is that the
instantaneous probability of an auditory nerve pulse is directly and linearly proportional to the
instantaneous amplitude of the signal that is being coded. In other words, if the signal amplitude
at a given instant is high, the fiber is very likely to fire and, conversely, if the signal amplitude at
a given instant is low, the fiber is not very likely to fire. The important thing to note is the
stunning similarity between the pulse probability code and the input waveform. The pulse
probability function can be seen as a very direct coding of the graded receptor potential which, in
turn, is a faithful representation of the input signal.6 (Note that both the receptor potential and the
A point worth noting about the Rose et al. data is that the pulse probability functions were derived by summing the activity of an individual fiber
over many hundreds of cycles of the input waveform. Being a probabilistic process, the pulse code will be very ragged looking on any individual
fiber for a single presentation of a stimulus. Since a listener perceives sound quality in real time based on single presentations of stimuli, it must
be the case that the pulse code is derived not on a single nerve fiber but by summing pulse probabilities over many nerve fibers. Calculations
suggest that the pulse code can be accumulated by summing the activity of something on the order of 1,000 fibers (Houde & Hillenbrand, 1996).
pulse probability function are half-wave rectified. Again, this is because hair cells are stimulated
to generate a receptor potential as a result of cilia shearing in one direction only. Since it is the
receptor potential that stimulates the generation of action potentials in neurons, spike probability
is close to zero in one half of each cycle.)
What is the pulse code good for? Unfortunately, as with many aspects of auditory analysis,
it is not currently known. In very general terms, one possibility is that the primary function of the
cochlea is simply to preserve the detailed shape of the input signal (in half-wave rectified form),
and the auditory nerve serves to code this signal with on-off pulses using the pulse-probability
code. The pulse-coded signal is then passed on to the central nervous system for analysis.
According to this view, the spectrum analysis is carried out in part not by the cochlea but by
purely neural mechanisms in the central nervous system. The neural mechanisms that might be
involved in carrying out this spectrum analysis at central levels – if they exist at all – are
currently not known. Neural circuits that are capable of deriving something equivalent to a
Fourier amplitude spectrum from the pulse-coded signal have been hypothesized, but it is not
currently known if these circuits exist (see Houde & Hillenbrand, 1997).
CHAPTER NOT QUITE FINISHED. STILL NEEDED ARE: (1) A QUICK TOUR OF
CENTRAL AUDITORY PATHWAY AND (2) A SUMMARY.
Batteau, D.W. (1967). The role of the pinna in human localization. Proceedings of the Royal
Society, Series B, 168, no. 1011, 158-180.
Bekesy von, G., and Rosenblith, W.A. (1958). The mechanical properties of the ear. In S.S.
Stevens (Ed.) Handbook of Experimental Psychology. New York: John Wiley & Sons.
Davis, H. (1965). A model for transducer action in the cochlea. Cold Spring Harbor Symposium
on Quantitative Biology, 30: Sensory Receptors, Cold Spring Harbor, NY: Cold Spring Harbor
Laboratory of Quantitative Biology, 181-190.
Freedman, S.J., & Fisher, H.G. (1968). The role of the pinna in auditory localization. The
Neuropsychology of Spatially Oriented Behavior. S.J. Freedman (Ed.), Homewood, IL: Dorsey
Press (chapter 8).
Fleming, N. (1939). Resonance in the external auditory meatus. Nature, 142, 642-643.
Houde, R.A., and Hillenbrand, J.M. A comb filter theory of auditory spectrum analysis. Under
review. [update this]
Spoendlin, H., and Schrott, A. (1989). "Analysis of the human auditory nerve". Hearing
Research, 43, 25-38.
Wiener, F.M., and Ross, D.A. (1939). Pressure distribution in the auditory canal in a progressive
sound field. Journal of the Acoustical Society of America, 18, 401-408.
Wier, C.C., Jestaedt, W., & Green, D.M. (1977). Frequency discrimination as a function of
frequency and sensation level. Journal of the Acoustical Society of America, 61, 178-184.