Reinforcement Learning (RL)
Q-learning agent in a grid world
Episode
0
Steps
0
Last reward
0.0
Window success
0/50
Grid world
Arrow shows best action; number below shows the Q-value of that action.
AgentGoal +10Trap −10
Reward per episode
Total reward earned per episode. Trends upward as learning progresses.
Hyperparameters
Stopping rule
Auto-stops when success rate over last N episodes hits threshold. Hard cap: 500 episodes.