Offline RL

Offline RL Benchmark Results

(Offline reinforcement learning) TTO performs on par with or better than the best prior offline reinforcement learning algorithms on the D4RL benchmark suite. Results for TTO correspond to the mean over 15 random seeds (5 independently trained Transformers and 3 trajectories per Transformer), with error bars depicting standard deviation between runs. We detail the sources of the performance for other methods in Appendix C. A listing of these results in tabular form is provided in Appendix E.