Information Theory · Signal Processing · Mathematics
📅 April 2026 ⏱ ≈ 12 min read 🎯 Intermediate

Shannon-Nyquist Theorem — The Limits of Information

Two theorems define the hard boundaries of digital communication. The Nyquist-Shannon sampling theorem tells us the minimum sampling rate needed to perfectly capture any analog signal. Claude Shannon's channel capacity formula tells us the maximum data rate any noisy channel can support, no matter how clever the encoding. Together they underpin every digital system ever built.

1. The Sampling Theorem

Continuous-time signals (audio, sensor readings) must be discretized before digital processing. The fundamental question is: how often must we sample to avoid losing information?

Nyquist-Shannon Sampling Theorem (1928/1949): A bandlimited signal with no frequency components above W Hz can be perfectly reconstructed from samples taken at a rate f_s > 2W samples per second.

Nyquist rate: f_s_min = 2W CD audio: W = 20 kHz (human hearing limit) f_s = 44 100 Hz (slightly above Nyquist) Medical ultrasound: W = 15 MHz → f_s ≥ 30 MSPS (megasamples/sec) AM radio (0–10 kHz audio): f_s = 22 050 Hz sufficient

The theorem is exact: not approximate. Sampling at exactly 2W allows perfect recovery of everything below W Hz — no information is lost, despite the discretization. This seems paradoxical: a finite sequence of numbers fully encoding a continuous waveform.

Why it works: Sampling in time replicates the signal's spectrum periodically in frequency. If f_s > 2W, the spectral copies don't overlap and can be isolated by a low-pass filter (the reconstruction filter). The Whittaker-Kotelnikov-Shannon interpolation formula does this exactly using sinc functions.

2. Aliasing

When f_s < 2W, the spectral copies overlap — high-frequency components fold back into the baseband and appear as low-frequency artefacts. This is aliasing:

A sinusoid at f Hz sampled at f_s Hz appears as a sinusoid at: f_alias = |f − round(f/f_s) · f_s| Example: f = 7 kHz, f_s = 8 kHz f_alias = |7000 − 1×8000| = 1000 Hz ← the 7 kHz tone becomes 1 kHz! In images (spatial aliasing): a fine checkerboard photographed too coarsely produces moiré patterns — large-scale interference fringes that weren't there.

Anti-aliasing filters band-limit the signal to below f_s/2 before sampling. In audio ADCs, this is typically an analog elliptic low-pass filter with a very sharp cutoff. In digital cameras, an optical low-pass filter (OLPF) blurs the image slightly to remove spatial frequencies beyond the pixel Nyquist limit.

Temporal aliasing in video: helicopter rotors in film sometimes appear to rotate slowly or backwards — the frame rate (24 fps) aliases the blade rotation frequency. The Nyquist theorem applies in time as well as frequency.

3. Perfect Reconstruction

Given Nyquist-rate samples x[n], the original continuous signal x(t) is recovered by:

x(t) = Σₙ₌₋∞^∞ x[n] · sinc( (t − n/f_s) · f_s ) sinc(u) = sin(πu) / (πu) Each sample contributes a sinc "pulse" centred at its time position. The sincs are orthogonal, so samples perfectly partition the signal energy.

In practice, perfect sinc interpolation requires infinite-length filters. Real systems use windowed sinc filters (e.g., Kaiser-windowed FIR) or polyphase filterbanks that achieve reconstruction error below the noise floor of the ADC.

4. Shannon Entropy

Claude Shannon's 1948 paper "A Mathematical Theory of Communication" introduced a precise measure of information: entropy.

H(X) = −Σᵢ pᵢ · log₂(pᵢ) (bits) p_i = probability of symbol i Fair coin (p = 0.5, 0.5): H = 1 bit (maximum uncertainty) Die (p = 1/6 each): H = log₂(6) ≈ 2.585 bits Biased coin (p = 0.9, 0.1): H = −(0.9 log₂0.9 + 0.1 log₂0.1) ≈ 0.47 bits

Entropy is the minimum average number of bits needed to encode the output of a random source. Shannon's source coding theorem proved that no lossless compression scheme can compress below the entropy rate — and Huffman coding achieves this optimally for integer-length codes.

For English text, entropy is approximately 1.0–1.5 bits/character (due to massive redundancy in the language). ZIP/gzip achieve roughly this compression ratio.

5. Shannon Channel Capacity

Shannon's channel coding theorem establishes the maximum rate at which information can be transmitted over a noisy channel with arbitrarily low error:

C = B · log₂(1 + S/N) bits per second B = bandwidth (Hz) S = signal power (W) N = noise power in bandwidth B = k_B · T · B (k_B = Boltzmann constant = 1.38×10⁻²³ J/K, T = temperature) This is the Shannon-Hartley theorem — a hard limit that no technology can exceed. Modern codes (LDPC, turbo codes) approach within 0.1 dB.

Implications

Shannon limit examples: DSL modem: B = 1.1 MHz, SNR = 40 dB → C = 1.1×10⁶ × 13.3 ≈ 14.6 Mbps 5G NR: B = 100 MHz, SNR = 30 dB → C = 10⁸ × 10 = 1 Gbps per cell Fiber: B = 4 THz (C-band), SNR = 24 dB → theoretical petabits/s

6. Error-Correcting Codes and the Shannon Limit

Shannon's theorem is an existence proof — it says reliable transmission at rates up to C is possible, but didn't say how to achieve it. The search for practical codes that approach the Shannon limit took 50 years:

Code rate vs capacity: a rate-½ code adds one parity bit per data bit. On the AWGN channel at 0 dB SNR (signal power = noise power), Shannon capacity = 0.5 bps/Hz. A rate-½ code operating right at that SNR needs to be infinitely long to approach zero error probability — in practice, codes of length ~10 000 bits achieve within 0.5 dB.

7. Real-World Applications

〰️ Open Fourier Transform Simulation →