MiniGrid Goal-Reaching
The method evaluated here and the experimental setup is identical to that described in Section 3.1:
Goal-conditioned reinforcement learning from the submission, with one distinction:
because the map changes each episode, the Transformer model has an additional context embedding that is a function of the current observation image.
This embedding is the output of a small convolutional neural network and is added to the token embeddings analogously to the treatment of position embeddings.
The agent position and goal state are not included in the map; these are provided as input tokens as described in Section 3.1.
The action space of this environment is discrete.
There are seven actions, but only four are required to complete the tasks: turning left, turning right, moving forward, and opening a door.
The training data is a mixture of trajectories from a pre-trained goal-reaching poliy and a uniform random policy.
94% of testing goals are reached by the model on held-out maps.
Lock symbols indicate doors in
MiniGrid-MultiRoom-N4-S5
.