Beam Search Hyperparameters
Beam width |
256 |
Planning horizon |
10 |
Vocabulary size |
100 |
Context size [number of $(\mathbf{s}, \mathbf{a}, r, V)$ tuples] |
5 |
$k_\text{obs}$ [top-k tokens from which observations are sampled] |
1 |
$k_\text{act}$ [top-k tokens from which actions are sampled] |
20 |
Beam width and context size are standard hyperparameters for decoding Transformer language models.
Planning horizon is a standard trajectory optimization hyperparameter.
$k_\text{obs}$ and $k_\text{act}$ indicate that actions are sampled from the most likely $20\%$ of action tokens and next observations are decoded greedily conditioned on previous observations and actions.