How CRISPR-Cas9 Works
A bacterial immune system, discovered in 2012 and refined since, lets scientists rewrite the genome with the precision of a word processor: find any 20-letter sequence in three billion base pairs, cut both DNA strands, and let the cell repair the break however the researcher wants. This is CRISPR-Cas9 — the most transformative molecular biology tool in decades.
Bacterial Immune Memory: Where CRISPR Came From
In 1987, Japanese scientists noticed a strange pattern in bacterial genomes: repeated DNA sequences separated by unique "spacer" sequences. The acronym CRISPR — Clustered Regularly Interspaced Short Palindromic Repeats — was coined in 2002.
It took until 2007 to understand what these sequences do. When a bacterium survives a viral infection, it cuts a short fragment of the virus's DNA and inserts it as a new spacer in its CRISPR array. This is a molecular memory of past infections. During a future infection, the bacterium transcribes these spacers into short RNA molecules and turns them into a surveillance system that destroys any DNA matching the stored sequence.
The Two Components: gRNA and Cas9
The CRISPR-Cas9 editing system requires just two molecular components:
- Single Guide RNA (sgRNA / gRNA): A short synthetic RNA molecule (~100 nucleotides) with two sections. The scaffold section folds into a structure that binds and recruits the Cas9 protein. The spacer section — 20 nucleotides chosen by the researcher — acts as a targeting address, complementary to the DNA sequence to be edited.
- Cas9 protein: A large enzyme (~1368 amino acids) that acts as molecular scissors. It has two nuclease domains, RuvC and HNH, each cutting one strand of the DNA double helix. Together they produce a double-strand break (DSB).
To design an edit, a researcher simply synthesises a 20-nucleotide guide sequence matching the target region and assembles it into the sgRNA scaffold. The same Cas9 protein is reused for every experiment — only the guide RNA changes.
The PAM Sequence: The Address Lock
Cas9 cannot cut DNA at an arbitrary location — the guide RNA must locate
a specific sequence called the PAM (Protospacer
Adjacent Motif). For the most common CRISPR tool (S. pyogenes
Cas9), the PAM sequence is NGG (any nucleotide followed by
two guanines), which must appear immediately 3′ of the target sequence
on the non-template strand.
The PAM requirement actually has a biological purpose: it prevents Cas9 from cutting the CRISPR array itself (where the spacers are stored), because those sequences are flanked by repeat sequences, not NGG PAMs.
The NGG PAM appears roughly every 8–12 base pairs in a typical genome, giving many potential cut sites near any gene of interest. Newer Cas variants (Cas12a, SpRY) recognise different PAMs or near-PAMless sequences, greatly expanding the targetable genome.
The Cutting Mechanism Step-by-Step
DNA Repair: NHEJ vs HDR
When Cas9 cuts the DNA, the cell's own repair mechanisms seal the break. The outcome of editing depends entirely on which repair pathway activates:
| Pathway | Mechanism | Result | Use for |
|---|---|---|---|
|
NHEJ Non-Homologous End Joining |
Ligates the two ends directly without a template. Prone to insertions or deletions (indels) of 1–20+ nucleotides. | Disrupted (frameshifted / truncated) gene — usually a knockout | Disabling a gene; modelling loss-of-function disease |
|
HDR Homology-Directed Repair |
Uses a provided DNA template with matching flanking sequences ("homology arms") to rewrite the cut site precisely. | Precise edit: base change, insertion of sequence, correction of mutation | Correcting a disease mutation; inserting a reporter gene |
NHEJ is active in all cell types and most of the cell cycle. HDR is much more limited — it only operates in S and G2 phases (when a sister chromatid is available as a repair template) and is generally 10–100× less efficient than NHEJ. Researchers use various tricks (small molecules, cell synchronisation, modified DNA templates) to bias cells toward HDR when a precise edit is needed.
Delivery: Getting CRISPR into Cells
The CRISPR components (Cas9 protein + gRNA, or their encoding DNA/RNA) must cross the cell membrane and reach the nucleus. Delivery is one of the biggest engineering challenges in the field:
- Electroporation: Brief electric pulses create transient pores in the membrane — efficient for cells in culture. The standard for ex vivo editing (remove cells from patient → edit → reinfuse).
- Lipid nanoparticles (LNPs): Fatty envelopes that fuse with cell membranes. Effective for liver delivery in vivo — used in the first approved CRISPR therapy (Casgevy, 2023).
- Adeno-associated virus (AAV): Engineered viruses stripped of disease genes. Wide tissue tropism; Cas9 is almost too large for a single AAV capsid (capacity ~4.7 kb; full-length Cas9 ~4.2 kb + promoter = tight fit).
- Ribonucleoprotein (RNP) complexes: Pre-assembled Cas9 + gRNA protein–RNA complex. Transient (reduced off-target risk), immune to nucleases in plasma.
Off-Target Effects and Specificity
Cas9 can tolerate mismatches between the guide RNA and the DNA target — particularly in the seed region (positions 1–12, distal from PAM), which is less critical for binding. A guide RNA can therefore cause unintended cuts at off-target sites with similar sequences elsewhere in the genome.
Mitigation strategies:
- High-fidelity Cas9 variants (eSpCas9, HiFi Cas9) — engineered mutations reduce non-specific DNA binding and off-target activity 10–100×.
- Paired nickases: Two Cas9 variants (each with one nuclease domain inactivated) are directed to adjacent sites; only together do they create a DSB, dramatically reducing single off-target nicks.
- Transient delivery (RNP or mRNA): Cas9 degrades within hours, limiting exposure time compared to stably integrated Cas9-expressing constructs.
- Bioinformatic guide design: Tools like Cas-OFFinder predict off-target sites; guides are designed to have high mismatch divergence with the rest of the genome.
Applications
Next-Generation Tools: Base Editing and Prime Editing
Classical CRISPR-Cas9 causes double-strand breaks, which are repaired imprecisely or require an HDR template. Newer tools avoid cuts entirely:
Base Editing (2016 — David Liu's lab)
A catalytically-impaired Cas9 ("nickase" or "dead" Cas9) is fused to a deaminase enzyme. The complex localises to the target but instead of cutting, directly converts one base into another within a 4–8 nucleotide editing window:
- Cytosine base editors (CBE): C → T conversions
- Adenine base editors (ABE): A → G conversions
Together they cover all four transition mutations (C→T, T→C, A→G, G→A), which account for about 30% of known pathogenic point mutations.
Prime Editing (2019 — David Liu's lab)
Prime editing uses a nicking Cas9 fused to a reverse transcriptase. A special guide RNA (pegRNA) encodes both the targeting sequence and the desired edit. The reverse transcriptase uses this as a template to write new sequence directly into the genome — without a DSB and without needing an external repair template. Prime editing can in principle make all 12 types of point mutations plus small insertions and deletions, with fewer off-target effects than Cas9.