Q-learning reinforcement learning agent in an 8×8 grid maze. The agent explores using an ε-greedy policy: randomly with probability ε, greedily otherwise. After each step, Q(s,a) ← Q(s,a) + α[r + γ·max Q(s',·) − Q(s,a)]. The value function Q-table is visualised as a colour heatmap; the greedy policy is shown as arrows. Three maze presets with walls and fire-penalty cells.

← Machine Learning

Reinforcement Learning 🎮

UK
Episode0
Step / ep.0
ε (explore)1.00
Ep. reward0

↩ = startS
⭐ = goal+10
🔥 = fire−5
Maze
ε (explore) 1.00
α (learn) 0.5
γ (discount) 0.95
Speed Medium