Visualization of VINDICATE. In the original state (left), the Key’s placement exposes the bottom wall in the agent’s view, triggering a learned wall-following heuristic and a suboptimal detour (red).