Reinforcement Learning (RL)

Q-learning agent in a grid world

Episode

Steps

Last reward

0.0

Window success

0/50

Arrow shows best action; number below shows the Q-value of that action.

AgentGoal +10Trap −10

Total reward earned per episode. Trends upward as learning progresses.

Learning rate (α)0.10

Discount (γ)0.95

Exploration (ε)0.30

Speed (steps/sec)20

Success threshold90%

Window size50

Auto-stops when success rate over last N episodes hits threshold. Hard cap: 500 episodes.