PlaNet Baseline

PlaNet Model Baseline

Reference On this page we visualize the predictions of the recurrent model (PlaNet [Hafner et al, 2018]) on the Humanoid locomotion prediction task. The original PlaNet architecture was designed for pixel observations, so we must make modifications to the observation model to make it compatible with our task, but wherever possible we keep the original design decisions (in both architecture and training procedure) intact. The data-collecting policy is the same as that in the submission, with ground-truth trajectories looking like the following:

Note that the planning horizon used with PlaNet (12) is shorter than the prediction horizon of this task (100). We observe that the model is accurate for the first dozen steps of the rollout, consistent with the reported results.

Stochastic prediction The output of the latent transition model parameterizes the predicted latent state with a Gaussian distribution with diagonal covariance. The latent state is sampled from this Gaussian distribution and then decoded deterministically into the actual state of the humanoid using the observation model. We found that decoding stochastically was generally unreliable:

Deterministic prediction We found that decoding the latent state deterministically (by using the mean as the prediction instead of sampling from the parameterized output distribution) to be somewhat better, but still noticeably worse than the Transformer due to compounding errors: