Continuous Transformer

Non-Discretized Transformers

(Transformer variants) A continuous variant of the Transformer architecture with outputs paramterizing a Gaussian distribution performs on par with a standard MLP dynamics model. We found that the unmodified GPT architecture applied to discretized data yielded more accurate long-horizon predictions. For more information about the discretization methods used, see this page describing their implementation.