Reinforcement Learning (RL)

Q-learning agent in a grid world

Episode
0
Steps
0
Last reward
0.0
Window success
0/50

Grid world

Arrow shows best action; number below shows the Q-value of that action.

AgentGoal +10Trap −10

Reward per episode

Total reward earned per episode. Trends upward as learning progresses.

Hyperparameters

Learning rate (α)0.10
Discount (γ)0.95
Exploration (ε)0.30
Speed (steps/sec)20

Stopping rule

Success threshold90%
Window size50

Auto-stops when success rate over last N episodes hits threshold. Hard cap: 500 episodes.