In 1964, cognitive scientist Roger Shepard published a short paper describing a sound that appeared to rise in pitch indefinitely, cycling back to its starting point without ever resolving. Listeners could not agree on how many steps the scale contained, whether it was ascending or descending, or where it “ended” — because it did not end. The Shepard tone was not a trick or an illusion in any pejorative sense; it was the correct perceptual response to a carefully engineered spectral structure. To understand why it works, you first need to understand how the ear and brain decode pitch in ordinary sounds — and that story begins with the harmonic series.
I. Additive Synthesis and the Harmonic Series
Every musical instrument produces a complex, periodic waveform. Fourier’s theorem guarantees that any periodic signal can be decomposed into a sum of sinusoids whose frequencies are integer multiples of a fundamental frequency f₀. These integer multiples are called harmonics or partials, and together they define the spectrum of the sound. A flute and a violin playing the same A440 produce the same fundamental, but wildly different harmonic amplitudes — which is why they sound different.
Fourier series of a periodic signal x(t):
x(t) = A₀ + Σₙ [ Aₙ cos(2π n f₀ t) + Bₙ sin(2π n f₀ t) ]
= A₀ + Σₙ Cₙ cos(2π n f₀ t + ϕₙ)
where:
n : harmonic number (1, 2, 3, ...)
f₀ : fundamental frequency (Hz)
Cₙ = √(Aₙ² + Bₙ²) : amplitude of n-th harmonic
ϕₙ = atan2(-Bₙ, Aₙ) : phase of n-th harmonic
Waveform shapes from harmonic structure:
Sine wave : only n=1 (pure tone)
Square wave : odd harmonics only; Cₙ = 4/(πn)
Sawtooth : all harmonics; Cₙ = 2/(πn) (alternating sign)
Triangle : odd harmonics only; Cₙ = 8/(π²n²)
Additive synthesis is the direct practical application of this: build any timbre you want by summing sinusoids. Want a richer, brasier sound? Add more high-frequency harmonics. Want something hollow and flute-like? Suppress the even harmonics. The harmonic series is the palette and the Fourier coefficients are the colours.
What the brain does with a spectrum
The auditory system does not directly measure frequency. The basilar membrane in the cochlea performs a mechanical frequency analysis — different positions along the membrane resonate to different frequencies, high frequencies at the base and low frequencies at the apex. The brain receives a spatial map of spectral energy, not a single pitch value. It must then infer the perceived pitch from that map.
For a harmonic sound, the process is called fundamental frequency estimation. The auditory system looks for a common divisor of the spectral peaks. If you hear partials at 400 Hz, 600 Hz, 800 Hz, and 1000 Hz, the greatest common divisor is 200 Hz, and you perceive a pitch of 200 Hz — even if there is no energy at 200 Hz whatsoever. This phenomenon, called the missing fundamental, is why a small telephone speaker can reproduce bass voices legibly despite being physically incapable of producing 100 Hz vibrations. The harmonics above 300 Hz are present; the brain fills in the rest.
The missing fundamental is why the earliest telephone engineers could transmit intelligible speech at only 300–3400 Hz. Male voices have fundamentals around 85–180 Hz, far below the passband — but the harmonics that do make it through are enough for the auditory system to reconstruct the correct perceived pitch.
II. The Shepard-Risset Glissando — An Infinite Staircase
Roger Shepard’s original discrete tone was extended by composer Jean-Claude Risset into a continuous, endlessly rising or falling glissando — the Shepard-Risset glissando. To understand it, picture a barber’s pole: the stripes always appear to move upward, but the pole is finite. The Shepard tone is the auditory equivalent.
The construction requires two simultaneous manipulations: a pitch shift and an amplitude envelope shaped like a bell curve across the octave register.
Shepard tone construction (discrete version):
N tones, each separated by one octave: fₙ = f₀ × 2ₙ
All tones share the same pitch-class (chroma), rising in unison.
Amplitude of tone n at time t:
Aₙ(t) = G( log₂(fₙ(t)) )
where G is a bell-shaped (Gaussian) envelope over log-frequency:
G(x) = exp( -(x - μ)² / (2σ²) )
μ : centre of the bell (fixed, typically the middle octave)
σ : width of the bell (typically 1.5 octaves)
As pitch-class rises by a semitone:
- the highest tone fades out (exits the top of the bell)
- a new lowest tone fades in (enters the bottom of the bell)
- the perceived pitch rises
- the spectral centre of mass stays constant
- looping back to the start is perceptually seamless
The key insight is the separation of two distinct pitch attributes that normally change together: chroma (pitch class: C, D, E… independent of octave) and height (the absolute register, low or high). In an ordinary melodic scale, both change together — when you ascend a semitone, both the chroma and the height increase. The Shepard tone changes only the chroma. The amplitude envelope ensures that the overall spectral centre of mass — and therefore the perceived height — stays fixed, while the chroma completes its twelve-step cycle and returns to the starting point.
Why the brain is deceived
The auditory system’s pitch extraction algorithm has evolved in a world where chroma and height are coupled. When they are decorrelated, the brain uses the rising chroma as its primary cue and extrapolates a rising pitch, ignoring the stable height cue from the overall spectral envelope. The illusion is robust: it persists even when listeners are told exactly how it works. Interestingly, which direction — rising or falling — listeners hear in ambiguous versions of the Shepard tone correlates weakly with language background: speakers of languages with falling intonation for statements tend to hear ambiguous Shepard tones as falling more often than speakers of rising-intonation languages.
Continuous (Risset) glissando:
Replace discrete tones with continuous sinusoidal sweeps.
Each component sweeps from f₀ to 2f₀ (one octave) in time T,
then resets to f₀ — but phase-continuously, so the reset is inaudible.
Multiple components staggered in time, each modulated by G.
Perceived pitch: always rising (or always falling in inverted version)
Actual spectral centre: constant
Period of illusion: T seconds (seamless loop)
III. Psychoacoustic Limits and the Critical Band
The Shepard tone exploits the auditory system’s spectral processing, but there are other constraints on what we can hear and what we cannot. The most important is the critical band: the frequency resolution of the cochlear filter bank. Two pure tones closer than a critical bandwidth interact perceptually — instead of two distinct pitches, you hear a single tone with amplitude fluctuations called beats.
Critical bandwidth (Bark scale approximation):
ERB(f) = 24.7 × (4.37×10⁻³ × f + 1) (in Hz)
where f is centre frequency in Hz
Examples:
f = 100 Hz → ERB ≈ 35 Hz
f = 1000 Hz → ERB ≈ 133 Hz
f = 4000 Hz → ERB ≈ 510 Hz
f = 8000 Hz → ERB ≈ 1040 Hz
Beat frequency when two tones f₁ and f₂ < ERB apart:
f_beat = |f₁ - f₂|
Perceived as: amplitude modulation at f_beat Hz
Sweet spot for roughness: f_beat ≈ 0.25 × ERB(f_centre)
Critical bands explain several puzzling musical phenomena. Consonant intervals — the octave, perfect fifth, major third — are those whose harmonics either coincide exactly or fall in separate critical bands, producing no beating. Dissonant intervals pack harmonics close together within the same critical band, generating audible roughness. The harmonic series and the musical scale are not arbitrary cultural conventions; they reflect the spectral resolution of the human cochlea.
This also explains why the Shepard tone uses octave-spaced components rather than, say, a tritone apart. Octave-spaced sinusoids fall neatly into separate critical bands throughout the audible range, so they are processed as independent spectral objects. If they were spaced more closely, they would interact within the same critical band and the illusion would break down — the listener would hear beating rather than a smooth pitch.
The auditory system can extract a pitch from as few as three consecutive harmonics, even if the fundamental is absent. But it gives up if the harmonics exceed roughly the 10th, because high harmonics fall within the same critical band as their neighbours and can no longer be individually resolved. This is why bass notes on a cello feel more “grounded” than the same pitch played on a piccolo: lower fundamentals have their resolved harmonics in the most sensitive frequency region.
Try It Yourself
The best way to internalise these ideas is to manipulate the spectra directly and hear the results. These simulations let you experiment with everything discussed above.
🎶Shepard Tone — The Infinite Auditory Staircase
Adjust the number of octave-spaced components, the width of the Gaussian amplitude envelope, and the rate of ascent. Switch between discrete and continuous (Risset) modes. Listen to the illusion and observe the spectrogram in real time.
Fourier Series — Build Any Waveform from Sine Waves
Draw a target waveform or choose a preset (square, sawtooth, triangle). See how adding harmonics one by one reconstructs the original shape. Hear the timbre change as each partial is toggled on or off.
Standing Waves — Harmonics on a String and in a Tube
Visualise the mode shapes of a vibrating string or air column at each harmonic. Superpose modes and watch how their sum evolves. Connects the physics of resonance to the spectral structure of real instruments.
Closing Thought
The Shepard tone is philosophically uncomfortable in a productive way. It demonstrates that pitch — one of the most immediate, seemingly objective qualities of a sound — is not a property of the sound wave itself. It is a construction performed by the brain, based on spectral patterns and hard-wired assumptions about how those patterns are generated in the natural world. When those assumptions are violated by a carefully designed stimulus, the brain produces a percept that is physically impossible.
Additive synthesis and the harmonic series are the formal language that makes such constructions possible. Once you understand that any sound is just a weighted sum of sinusoids — and that the auditory system is essentially running a reverse Fourier transform on the basilar membrane’s output — you have the tools to design sounds that do things ordinary instruments cannot: tones that spiral endlessly upward, chords that seem simultaneously consonant and tense, timbres that shift without any change in pitch. The mathematics of hearing is not separate from music; it is the deeper structure that music has always been navigating, whether the composer knew it or not.