Spotlight #64 – Quantum, Nanotech & AI: the Science Behind Our Newest Simulations

Four simulations from Wave 98 sit at the frontier of modern science: glowing semiconductor nanocrystals whose colour you can tune by changing their size, amphiphilic molecules that spontaneously organise into cell membranes, a machine-learning tree that carves data space into decisions, and a reinforcement-learning agent that teaches itself to navigate by trial and error. This spotlight opens the hood on each one.

I. Quantum Dots — Colour from Confinement

🔵

Quantum Dots — Brus Equation & Size-Tunable Emission

Drag the radius slider from 1–10 nm and watch the dot glow shift from blue to red in real time. Shows conduction/valence band diagram, Bohr radius threshold, and material presets.

A piece of bulk cadmium selenide is orange. Grind it down to a 3 nm crystal and it glows blue-green. Grind it to 6 nm and it glows red. The colour has nothing to do with chemical composition — the crystal is the same material both times. What changes is the degree of quantum confinement.

In a bulk semiconductor, electrons and holes exist in continuous energy bands, and the gap between those bands — the bandgap — is a fixed property of the crystal structure. But when you shrink the crystal below a characteristic length called the exciton Bohr radius, the electron and hole are squeezed into a box so small that the particle-in-a-box quantisation of their kinetic energy becomes significant. The effective bandgap widens. A wider gap means higher-energy photons emitted when the exciton recombines — shorter wavelength, bluer light.

The Brus equation captures this size-dependence in a two-term correction on top of the bulk bandgap. The first term is the quantum confinement energy, treating the electron and hole as independent particles in a spherical potential well and weighting by their effective masses. The second term is a negative Coulomb correction — the electron and hole attract each other, lowering the energy slightly, with the effect scaling inversely with radius because confinement forces them closer together.

E_gap(r) = E_bulk + h^2/(8r^2) * (1/m_e* + 1/m_h*) - 1.8 * e^2/(4*pi*eps*eps_0*r)

  E_bulk : bulk bandgap (e.g. 1.74 eV for CdSe)
  r      : nanocrystal radius (m)
  m_e*   : effective electron mass (fraction of free electron mass)
  m_h*   : effective hole mass
  eps    : relative permittivity of the semiconductor (~10 for CdSe)
  First term  : confinement energy (raises gap)
  Second term : Coulomb correction (lowers gap)

lambda_emit = h*c / E_gap(r)     (emission wavelength)

The simulation covers CdSe, ZnS, InP, and PbS presets, each with its own bulk bandgap, effective masses, and Bohr radius. The glow colour displayed is mapped from the emission wavelength to an approximate sRGB value using the CIE colour-matching functions, so the colour shift you see is physically grounded, not just aesthetic. Quantum dots are now commercially used in QLED displays, medical imaging probes, and solar concentrators, precisely because the emission peak can be placed anywhere in the visible and near-infrared spectrum by choosing the particle size.

The exciton Bohr radius a_B = eps * (m_e/mu) * a_0, where mu is the reduced mass of the electron-hole pair and a_0 is the hydrogen Bohr radius (0.053 nm). For CdSe, a_B ≈ 5.6 nm — below this radius, confinement effects become significant.

II. Molecular Self-Assembly — Order for Free

🧬

Molecular Self-Assembly — CMC, Packing Parameter & Phase Diagram

Tune molecule geometry with the packing parameter slider and watch the system tip between micelles, vesicles, cylinders, and bilayers. Shows CMC threshold and aggregate size distribution.

Life runs on self-assembly. Cell membranes, ribosomes, viral capsids — none of these are stitched together by a machine. They form spontaneously because their components are designed (by evolution) to be thermodynamically unfavourable as isolated molecules and thermodynamically favourable as organised aggregates.

The key players are amphiphiles: molecules with a water-loving (hydrophilic) head and a water-hating (hydrophobic) tail. In water, they face a choice: expose the tails to water and pay a large entropic/enthalpic penalty, or bury the tails by aggregating. Below a threshold concentration called the critical micelle concentration (CMC) there are too few molecules to form stable aggregates, so they exist as monomers and adsorb at the air–water interface, reducing surface tension. Above the CMC, new molecules preferentially join aggregates rather than increasing the monomer concentration — which is why the surface tension of a surfactant solution plateaus above the CMC.

Which aggregate geometry forms depends on the molecular geometry, captured by the dimensionless packing parameter p:

p = v / (a_0 * l_c)

  v   : volume of the hydrophobic tail(s)
  a_0 : optimal head-group area (set by electrostatics and hydration)
  l_c : critical chain length (roughly 0.8 * fully extended chain)

p < 1/3       : spherical micelles
1/3 < p < 1/2 : cylindrical micelles (wormlike)
1/2 < p < 1   : vesicles / flexible bilayers
p ~= 1        : planar bilayers (cell membranes, lipid bilayers)
p > 1         : inverted micelles (water-in-oil)

Single-chain detergents with bulky heads sit around p ≈ 0.3 and make spherical micelles — the kind that clean your dishes by encapsulating grease droplets. Phospholipids with two chains have p ≈ 1 and form the planar bilayers that constitute every cell membrane on Earth. The simulation lets you walk p from 0.2 to 1.2 and watch the aggregate morphology in a coarse-grained molecular dynamics cell switch between all five regimes.

The underlying driving force is the hydrophobic effect — not a direct attractive force between tails, but the gain in entropy of the surrounding water molecules when the tails stop disrupting their hydrogen-bond network. This is why self-assembly is temperature-sensitive: heating weakens the hydrophobic driving force and can dissolve micelles above a cloud point.

III. Decision Tree (CART) — Splitting on Impurity

🌳

Decision Tree — CART, Gini Impurity & Depth Control

Train on a 2D dataset, control max depth and minimum leaf size. Visualise the piecewise decision boundary, the tree diagram, and a feature-importance bar chart.

A decision tree answers a classification question by asking a sequence of yes/no questions about the input features. Each internal node tests one feature against a threshold; the two branches partition the training data; and the recursion continues until the leaves are pure enough or the tree is deep enough.

The CART (Classification And Regression Trees) algorithm chooses each split by minimising Gini impurity — a measure of how mixed the class labels are in a node. A perfectly pure node (all one class) has Gini = 0. A maximally impure node (equal class proportions) has Gini = 1 − 1/K for K classes. At each node, CART evaluates every feature and every threshold value in the training data, and picks the (feature, threshold) pair that produces the largest weighted reduction in Gini impurity across the two child nodes.

Gini(node) = 1 - sum_k p_k^2
  p_k : fraction of samples in class k at this node

Split gain = Gini(parent) - (n_L/n)*Gini(left) - (n_R/n)*Gini(right)
  n_L, n_R : number of samples in left and right children
  n        : total samples at parent

Information gain (entropy mode):
  H(node) = - sum_k p_k * log2(p_k)
  Gain    = H(parent) - (n_L/n)*H(left) - (n_R/n)*H(right)

Leaf prediction: majority class of training samples in that leaf
Pruning: stop if depth >= max_depth or n < min_samples_leaf

The simulation makes the overfitting problem vivid. A tree grown to max depth on a noisy dataset carves out tiny islands around every outlier — a jagged, bumpy decision boundary that will fail on new data. Capping depth at 3 or 4 produces a coarser but far more generalisable boundary. The accompanying random-forest simulation (from devlog #95) shows how averaging 50 such trees smooths out this variance.

CART is also used for regression (minimising mean squared error instead of Gini), and it is the base learner underlying gradient-boosted tree ensembles like XGBoost and LightGBM. The feature importance output of a CART tree (measured as total impurity reduction attributed to each feature) gives an intuitive, if biased, ranking of which inputs matter most.

IV. Q-Learning — Learning by Doing

🤖

Q-Learning Agent — Bellman Equation & Grid World

Watch the Q-table heat map update in real time as the agent explores a grid with walls, rewards, and penalties. Tune learning rate, discount factor, and epsilon decay.

Reinforcement learning is the science of learning from consequences. Unlike supervised learning, which needs labelled training data, an RL agent learns by trying actions, receiving rewards or penalties, and updating its behaviour accordingly. Q-learning is one of the oldest and most elegant RL algorithms — it provably converges to an optimal policy in tabular environments, without needing a model of how the environment works.

The agent maintains a table Q(s, a) estimating the expected total discounted reward for taking action a in state s and then acting optimally thereafter. This is the action-value function Q*. After each transition (s, a) → (r, s′), the Bellman equation gives the target value: the immediate reward r plus the best possible future value from s′, discounted by γ. The Q-value is updated toward this target by a small step α.

Bellman target:  y = r + gamma * max_{a'} Q(s', a')

TD update:       Q(s, a) += alpha * [y - Q(s, a)]
  alpha  : learning rate (e.g. 0.1 -- smaller = slower but stable)
  gamma  : discount factor (0.9 = values rewards 10 steps away)
  r      : reward signal (+1 for goal, -1 for pit, 0 otherwise)

Policy:  epsilon-greedy
  with prob epsilon : take a random action (exploration)
  with prob 1-epsilon : take argmax_a Q(s, a) (exploitation)
  epsilon decays each episode: eps *= decay_rate

The simulation visualises the Q-table as a heat map: each cell shows the maximum Q-value over all actions, which you can interpret as “how good is it to be in this state?” Early in training the map is uniform. As the agent accumulates experience, values propagate backwards from the reward through the Bellman recursion — a process called temporal-difference credit assignment — and the heat map develops a gradient pointing toward the goal.

The ε-greedy policy resolves the exploration–exploitation dilemma in the simplest possible way: inject randomness to explore unknown states, but exploit accumulated knowledge most of the time. Setting ε too high and the agent never converges; too low and it gets stuck in a local optimum. The simulation lets you watch this tension directly by freezing ε at various values and observing whether the agent finds the optimal path.

Q-learning is off-policy: it updates toward the greedy action max_a' Q(s', a') regardless of which action the behaviour policy actually chose. This is what allows learning from exploratory trajectories without bias — a key theoretical advantage over on-policy methods like SARSA.

← Devlog #96: Wave 98 Spotlight #65: Sports, Robotics & Graphics →