X-Ray Crystallography — How We See Atoms
In 1912, Max von Laue shone X-rays through a copper sulphate crystal and observed a pattern of spots on a photographic plate. That image launched a century of discovery: the double helix, the structure of penicillin, haemoglobin, the ribosome, and tens of thousands of drug targets. X-ray crystallography remains the most precise method ever devised for determining how atoms are arranged in matter — accurate to within a few hundredths of an angstrom.
1. Crystal Lattices and Unit Cells
A crystal is a solid in which atoms, ions, or molecules are arranged in a perfectly periodic three-dimensional pattern. This periodicity is described by a Bravais lattice — an infinite set of points generated by three lattice vectors a, b, c, where any lattice point can be reached from the origin by an integer combination n₁a + n₂b + n₃c.
The unit cell is the smallest repeating unit of the lattice — a parallelepiped defined by the three lattice vectors and their interaxial angles α, β, γ. Everything about the crystal's structure is encoded in the contents and geometry of a single unit cell. Seven crystal systems (triclinic, monoclinic, orthorhombic, tetragonal, trigonal, hexagonal, cubic) and 14 distinct Bravais lattice types exhaust all possible translational symmetries in three dimensions.
Within the unit cell, atoms occupy specific fractional coordinates (x, y, z) where each coordinate ranges from 0 to 1. The full crystal structure is then generated by applying all crystallographic symmetry operations (rotations, reflections, screw axes, glide planes) defined by the crystal's space group. There are exactly 230 distinct space groups.
2. Miller Indices and Crystal Planes
Any set of parallel planes passing through lattice points can be described by three integers (h, k, l) called Miller indices. These are defined as the reciprocals of the fractional intercepts the plane makes with the unit cell axes a, b, c, scaled to the smallest integers.
For example, a plane that intersects the a axis at 1/2, the b axis at 1/3, and runs parallel to c (intercept at infinity) has fractional intercepts (1/2, 1/3, ∞), reciprocals (2, 3, 0), giving Miller indices (230). The spacing d between adjacent planes in a family (hkl) depends on both the Miller indices and the lattice parameters. For a cubic crystal with lattice parameter a:
Miller indices provide a complete catalogue of all planes in a crystal. Each family of planes produces a distinct diffraction spot (reflection), and the full set of observed reflections constitutes the diffraction pattern that encodes the crystal structure.
The reciprocal lattice is the mathematical dual of the direct lattice: each point in reciprocal space with coordinates (h, k, l) corresponds to one family of planes in real space. The reciprocal lattice vectors a*, b*, c* satisfy a·a* = 1, a·b* = 0, etc. Diffraction occurs when the scattering vector equals a reciprocal lattice vector — this is the Laue condition, equivalent to Bragg's law.
3. Bragg's Law: 2d·sinθ = nλ
In 1913, William Henry Bragg and his son William Lawrence Bragg derived a remarkably simple condition for constructive interference of X-rays scattered by a crystal. Treating diffraction as reflection from crystal planes, they showed that a sharp diffraction spot (Bragg reflection) is observed only when:
where:
- d — interplanar spacing of the (hkl) planes (in angstroms or nanometres)
- θ — the angle between the incident X-ray beam and the reflecting plane (not the normal to the plane)
- n — the order of diffraction (positive integer; usually n = 1 since higher orders correspond to planes with d/n spacing)
- λ — X-ray wavelength (typically 0.5–2.5 Å for crystallography; characteristic copper Kα radiation is 1.5418 Å)
The physical picture: X-rays scattered from successive parallel planes travel path length differences of 2d·sinθ. When this equals an integer number of wavelengths, waves scattered from all planes add constructively. When the condition is not met, waves from the many thousands of planes in a real crystal cancel by destructive interference — no scattered intensity is observed.
Bragg's law immediately explains why atomic-scale structures require X-rays rather than visible light: to probe d-spacings of 1–10 Å, the wavelength must be comparable, placing the required radiation firmly in the X-ray regime (0.1–10 nm wavelength). Visible light (400–700 nm) cannot resolve atomic spacings.
Crystal Structures Simulation Explore 3D crystal lattices, unit cells, and diffraction geometry interactively4. The Structure Factor F(hkl)
Bragg's law tells us when a reflection occurs, but not how strong it is. The intensity of each reflection depends on the arrangement of atoms within the unit cell. This is quantified by the structure factor F(hkl), a complex number whose modulus squared gives the intensity:
where the sum runs over all j atoms in the unit cell, each at fractional coordinates (xⱼ, yⱼ, zⱼ), and:
- fⱼ — atomic scattering factor of atom j; proportional to the number of electrons (heavier atoms scatter more strongly). It also depends on sinθ/λ due to the finite size of the electron cloud.
- exp[2πi(hxⱼ + kyⱼ + lzⱼ)] — phase factor encoding the position of atom j; the dot product of the Miller index vector (h,k,l) with the atomic position (xⱼ,yⱼ,zⱼ) determines the phase contribution
The measured intensity of reflection (hkl) is:
Several correction factors apply in practice: the Lorentz factor (geometric correction for collection time), the polarisation factor (X-rays are partly polarised), and the Debye-Waller factor B, which accounts for thermal motion — atoms vibrating around their equilibrium positions effectively smear their electron density, attenuating high-angle reflections:
5. Electron Density and Fourier Transforms
The structure factors F(hkl) are the Fourier coefficients of the electron density ρ(x, y, z) in the unit cell. The electron density — the actual physical quantity we want to determine — is recovered by an inverse Fourier transform:
where V is the unit cell volume and the sum runs over all measured reflections. The forward transform connects real-space density to diffraction-space structure factors; the inverse transform goes back. This reciprocal relationship is central to crystallography.
High-resolution electron density maps (computed from many thousands of reflections out to large sinθ/λ) show clearly resolved peaks at atomic positions. At 2 Å resolution, individual atoms appear as well-separated peaks. At 1 Å resolution, bonding electron density between atoms becomes visible. Below 0.5 Å ("ultra-high resolution"), lone pairs and bonding electrons can be mapped.
In practice, the electron density map is computed on a grid, contoured at a chosen number of standard deviations above mean density, and fitted with an atomic model. The model is refined by iterative least-squares minimisation of the R-factor:
Good small-molecule structures achieve R ~ 0.03–0.05; protein structures typically converge to R ~ 0.15–0.25, with the free R-factor (calculated on a withheld test set) used to monitor overfitting.
6. The Phase Problem
Here lies the central obstacle of crystallography: a diffraction experiment measures intensities I(hkl) ∝ |F(hkl)|², which gives us the amplitude |F(hkl)| of each structure factor. But F(hkl) is a complex number — it has both amplitude and phase angle φ(hkl). The Fourier transform that recovers the electron density requires both.
Because detectors measure intensities (power), the phase information is lost. Without phases, the inverse Fourier transform cannot be computed and the structure cannot be determined. This is the phase problem, the fundamental barrier in crystallography, and solving it is the art of the discipline.
Three main strategies solve the phase problem:
- Direct methods — exploit statistical relationships among structure factor phases imposed by the requirement that electron density must be positive everywhere. Works for small molecules (up to ~200 atoms); the basis of the SHELX program suite. George Karle and Herbert Hauptman won the 1985 Nobel Prize in Chemistry for developing direct methods.
- Isomorphous replacement — soak the crystal in a heavy atom solution (platinum, mercury, gold). The heavy atoms scatter strongly and alter intensities predictably. By comparing native and derivative diffraction patterns, phases can be calculated. Rosalind Franklin and others used this approach.
- Molecular replacement — if a homologous structure is already known, it can be positioned (rotated and translated) in the unit cell of the new crystal to provide initial phases. This is the most widely used method today, since the PDB now contains so many reference structures.
7. Patterson Functions and Phasing Methods
Before direct methods were developed, the Patterson function provided a phase-free way to extract information about interatomic vectors. Arthur Lindo Patterson (1935) showed that the Fourier transform of the intensity data — rather than the structure factors themselves — gives a useful quantity:
The Patterson function has a peak at (u, v, w) whenever two atoms in the structure are separated by the vector (u, v, w). It is the autocorrelation of the electron density. For a structure with N atoms, the Patterson map has N² − N peaks (excluding the origin). For small structures, these peaks can be interpreted directly; for larger structures, the map becomes too crowded.
The Patterson approach is particularly powerful for locating heavy atoms: a mercury atom has 80 electrons vs ~8 for average protein atoms, so Hg-Hg vectors produce very prominent Patterson peaks. Once heavy atom positions are found, their contribution to the phases can be calculated and used to phase the entire dataset by multiple isomorphous replacement (MIR).
Anomalous dispersion (SAD/MAD phasing) exploits the fact that at X-ray energies near an absorption edge of a heavy element, the atomic scattering factor acquires an imaginary component. This breaks Friedel's law (normally |F(hkl)| = |F(−h,−k,−l)|) and provides phase information from differences between Friedel pairs. Selenium — incorporated into proteins as selenomethionine — has its absorption edge conveniently within the range of synchrotron beamlines, making Se-SAD the dominant phasing method at modern synchrotrons.
8. Synchrotron Radiation and Modern Beamlines
Laboratory X-ray sources (rotating anode generators) emit characteristic radiation at fixed wavelengths. Synchrotron radiation is transformatively superior: electrons or positrons circulating at relativistic speeds in a storage ring emit extremely intense, highly collimated, tunable X-ray beams as a byproduct of centripetal acceleration.
Key advantages of synchrotron sources over laboratory sources:
- Flux — 10⁸–10¹² times more photons per second. A dataset that requires days on a lab source takes seconds at a synchrotron.
- Tunability — wavelength is continuously adjustable across the X-ray range, enabling anomalous dispersion phasing at element-specific absorption edges
- Collimation — beams as small as 1 μm in diameter, enabling micro-crystallography of tiny crystals and serial crystallography
- Time resolution — pump-probe experiments can follow chemical reactions in crystals on timescales from microseconds to femtoseconds
Modern macromolecular crystallography (MX) beamlines at facilities such as Diamond Light Source (UK), ESRF (France), APS (USA), and SPring-8 (Japan) are highly automated: robotic sample changers swap crystals, diffractometers auto-centre, and data processing pipelines run in real time. A skilled crystallographer can collect complete datasets from dozens of crystals in a single shift.
X-ray Free Electron Lasers (XFELs) — at LCLS (Stanford) and European XFEL (Hamburg) — push the frontier further: pulses of ~10 fs duration and extreme peak brightness enable "diffraction before destruction," collecting a diffraction pattern from a single nanocrystal before the X-ray pulse destroys it. Serial femtosecond crystallography (SFX) can solve structures of radiation-sensitive samples and capture enzymatic intermediates too transient for conventional methods.
9. Protein Crystallography and the PDB
Proteins are the molecular machines of life — enzymes, motors, transporters, receptors. Understanding their function requires knowing their three-dimensional shape at atomic resolution. Protein crystallography provided the first such views: myoglobin (John Kendrew, 1958), haemoglobin (Max Perutz, 1960), and lysozyme (David Phillips, 1965). Kendrew and Perutz shared the 1962 Nobel Prize in Chemistry.
Protein crystallography presents challenges not encountered with small molecules:
- Crystal growth — proteins must be crystallised from solution, typically by slow addition of precipitants (ammonium sulphate, PEG). Crystal quality is highly sensitive to pH, temperature, additives, and protein purity. Growing diffraction-quality crystals remains the rate-limiting step for many targets.
- Size — a typical protein of 300 amino acids has ~2,400 non-hydrogen atoms. The asymmetric unit may contain several protein copies, drug molecules, and ordered water molecules — tens of thousands of atoms in total.
- Radiation damage — intense X-ray beams generate free radicals that damage crystals during data collection. Cryocooling crystals to 100 K (liquid nitrogen temperature) dramatically reduces radiation damage and is now standard practice.
- Resolution — protein crystals diffract to lower resolution than small-molecule crystals, typically 1.5–3.5 Å, because protein crystals contain ~50% solvent, reducing the number of crystals that pack ideally and limiting the quality of diffraction.
At 2 Å resolution, amino acid side chains are well resolved, water molecules are visible, and ligand binding geometry can be accurately determined — sufficient for structure-based drug design. At 1.2 Å (atomic resolution), hydrogen atoms become visible and charge densities can be mapped. At the current practical limit of ~0.8 Å in favourable cases, bond lengths and angles approach the accuracy of quantum chemical calculations.
The Protein Data Bank (PDB), founded in 1971 with seven structures, now contains over 220,000 macromolecular structures (as of 2026). Approximately 85% were determined by X-ray crystallography; the remainder by cryo-electron microscopy and NMR. Every structural paper published in major journals requires deposition of coordinates and experimental data in the PDB, making it one of the most valuable open-access scientific resources in existence.