Best practices

Hyperparameters

As a starting point, aim for at least a few hundred examples.

Generally, the more data the better the model will adapt to a particular task. We recommend using the default values for all hyperparameters:

  • Batch size: 16
  • Epochs: 64
  • Learning rate: 5e-4
  • Number of Virtual tokens: 100

Changing hyperparameters values from default may either increase or decrease the quality of the tuned model. Several rounds of experiments might be needed to find better models.

Smaller models tend to take more epochs to train.
Models can be badly trained for a number of reasons. For example, due to small dataset, large learning rates, or an insufficient number of epochs. Common observations include bad or incomplete responses marked with the token. In this case, you might have to retrain the model with a different set of hyperparameter. In the future, we will allow logging the training process to W&B for inspection.