Shannon-Nyquist Theorem — The Limits of Information

Two theorems define the hard boundaries of digital communication. The Nyquist-Shannon sampling theorem tells us the minimum sampling rate needed to perfectly capture any analog signal. Claude Shannon's channel capacity formula tells us the maximum data rate any noisy channel can support, no matter how clever the encoding. Together they underpin every digital system ever built.

1. The Sampling Theorem

Continuous-time signals (audio, sensor readings) must be discretized before digital processing. The fundamental question is: how often must we sample to avoid losing information?

Nyquist-Shannon Sampling Theorem (1928/1949): A bandlimited signal with no frequency components above W Hz can be perfectly reconstructed from samples taken at a rate f_s > 2W samples per second.

Nyquist rate: f_s_min = 2W CD audio: W = 20 kHz (human hearing limit) f_s = 44 100 Hz (slightly above Nyquist) Medical ultrasound: W = 15 MHz \to f_s \geq 30 MSPS (megasamples/sec) AM radio (0-10 kHz audio): f_s = 22 050 Hz sufficient

The theorem is exact: not approximate. Sampling at exactly 2W allows perfect recovery of everything below W Hz — no information is lost, despite the discretization. This seems paradoxical: a finite sequence of numbers fully encoding a continuous waveform.

Why it works: Sampling in time replicates the signal's spectrum periodically in frequency. If f_s > 2W, the spectral copies don't overlap and can be isolated by a low-pass filter (the reconstruction filter). The Whittaker-Kotelnikov-Shannon interpolation formula does this exactly using sinc functions.

2. Aliasing

When f_s < 2W, the spectral copies overlap — high-frequency components fold back into the baseband and appear as low-frequency artefacts. This is aliasing:

A sinusoid at f Hz sampled at f_s Hz appears as a sinusoid at: f_alias = |f - round(f/f_s) \cdot f_s| Example: f = 7 kHz, f_s = 8 kHz f_alias = |7000 - 1\times8000| = 1000 Hz \leftarrow the 7 kHz tone becomes 1 kHz! In images (spatial aliasing): a fine checkerboard photographed too coarsely produces moiré patterns — large-scale interference fringes that weren't there.

Anti-aliasing filters band-limit the signal to below f_s/2 before sampling. In audio ADCs, this is typically an analog elliptic low-pass filter with a very sharp cutoff. In digital cameras, an optical low-pass filter (OLPF) blurs the image slightly to remove spatial frequencies beyond the pixel Nyquist limit.

Temporal aliasing in video: helicopter rotors in film sometimes appear to rotate slowly or backwards — the frame rate (24 fps) aliases the blade rotation frequency. The Nyquist theorem applies in time as well as frequency.

3. Perfect Reconstruction

Given Nyquist-rate samples x[n], the original continuous signal x(t) is recovered by:

x(t) = Σₙ₌₋\infty^\infty x[n] \cdot sinc( (t - n/f_s) \cdot f_s ) sinc(u) = sin(πu) / (πu) Each sample contributes a sinc "pulse" centred at its time position. The sincs are orthogonal, so samples perfectly partition the signal energy.

In practice, perfect sinc interpolation requires infinite-length filters. Real systems use windowed sinc filters (e.g., Kaiser-windowed FIR) or polyphase filterbanks that achieve reconstruction error below the noise floor of the ADC.

4. Shannon Entropy

Claude Shannon's 1948 paper "A Mathematical Theory of Communication" introduced a precise measure of information: entropy.

H(X) = -Σᵢ pᵢ \cdot log₂(pᵢ) (bits) p_i = probability of symbol i Fair coin (p = 0.5, 0.5): H = 1 bit (maximum uncertainty) Die (p = 1/6 each): H = log₂(6) \approx 2.585 bits Biased coin (p = 0.9, 0.1): H = -(0.9 log₂0.9 + 0.1 log₂0.1) \approx 0.47 bits

Entropy is the minimum average number of bits needed to encode the output of a random source. Shannon's source coding theorem proved that no lossless compression scheme can compress below the entropy rate — and Huffman coding achieves this optimally for integer-length codes.

For English text, entropy is approximately 1.0–1.5 bits/character (due to massive redundancy in the language). ZIP/gzip achieve roughly this compression ratio.

5. Shannon Channel Capacity

Shannon's channel coding theorem establishes the maximum rate at which information can be transmitted over a noisy channel with arbitrarily low error:

C = B \cdot log₂(1 + S/N) bits per second B = bandwidth (Hz) S = signal power (W) N = noise power in bandwidth B = k_B \cdot T \cdot B (k_B = Boltzmann constant = 1.38\times10⁻²³ J/K, T = temperature) This is the Shannon-Hartley theorem — a hard limit that no technology can exceed. Modern codes (LDPC, turbo codes) approach within 0.1 dB.

Implications

Bandwidth vs SNR are interchangeable: doubling bandwidth doubles capacity; increasing SNR by 3 dB adds 1 bit/s/Hz. At high SNR, bandwidth is more valuable.
Infinite SNR → infinite capacity? No. At low SNR, adding bandwidth helps; at very high SNR, capacity grows only logarithmically with power.
Thermal noise floor: at 290 K (room temperature), thermal noise power density = kT = −174 dBm/Hz. This is the absolute lower bound on receiver noise and thus the ultimate limit on capacity.

Shannon limit examples: DSL modem: B = 1.1 MHz, SNR = 40 dB \to C = 1.1\times10⁶ \times 13.3 \approx 14.6 Mbps 5G NR: B = 100 MHz, SNR = 30 dB \to C = 10⁸ \times 10 = 1 Gbps per cell Fiber: B = 4 THz (C-band), SNR = 24 dB \to theoretical petabits/s

6. Error-Correcting Codes and the Shannon Limit

Shannon's theorem is an existence proof — it says reliable transmission at rates up to C is possible, but didn't say how to achieve it. The search for practical codes that approach the Shannon limit took 50 years:

Hamming codes (1950): correct single errors; far from the Shannon limit.
Convolutional codes + Viterbi decoding (1967): used in early satellite and deep-space communication (Voyager). Within a few dB of limit.
Turbo codes (1993): within 0.5 dB of Shannon limit; revolutionized 3G cellular.
LDPC codes (rediscovered 1996): within 0.0045 dB of Shannon limit; used in WiFi 6, 10GbE, DVB-S2, CCSDS deep-space.
Polar codes (2009): provably achieve Shannon capacity as code length → ∞; used in 5G NR.

Code rate vs capacity: a rate-½ code adds one parity bit per data bit. On the AWGN channel at 0 dB SNR (signal power = noise power), Shannon capacity = 0.5 bps/Hz. A rate-½ code operating right at that SNR needs to be infinitely long to approach zero error probability — in practice, codes of length ~10 000 bits achieve within 0.5 dB.

7. Real-World Applications

CD audio (44 100 Hz/16-bit): Nyquist theorem applied to 20 kHz hearing limit. The 44.1 kHz rate has 10% margin for the anti-aliasing filter rolloff.
ADSL/VDSL broadband: OFDM across the telephone wire bandwidth; each subcarrier modulated at the SNR-limited capacity of that frequency range.
Deep-space communication (Voyager, New Horizons): extremely low SNR (signal from 20 billion km); near-Shannon LDPC and convolutional codes essential for any data at all.
Digital cameras: Bayer arrays sample at the CCD pixel grid; demosaicing is a form of interpolation that respects the spatial Nyquist limit.
Medical imaging (MRI): k-space is sampled; reconstruction is an inverse 2-D Fourier transform; compressed sensing allows sub-Nyquist sampling when images are sparse in some domain.

〰️ Open Fourier Transform Simulation →