Hyperparameters
As a starting point, aim for at least a few hundred examples for fine-tuning a model. Generally, the more data the better the model will adapt to a particular task. Additionally, tweaking the hyperparameters for fine-tuning can often lead to a model that produces higher quality completions. Hyperparameters are parameters that are used to control the learning process.
We recommend using the default values for all hyperparameters:
- Batch size: 16
- Epochs: 64
- Learning rate: 5e-4
- Number of Virtual tokens: 100
Batch size is the number of training samples (e.g., sentences, paragraphs) used to update the model's parameters during each iteration of training. In general, we've found that larger batch sizes tend to work better for larger datasets.
Epochs refers to the model being trained one full cycle through the dataset. The number of epochs you should set depends on the number of training examples in your dataset and the size of the model you're starting from. Generally, larger models and datasets (many thousands of training examples) will need fewer epochs to train. We recommend slightly overtraining your models.
Learning rate is used to set the fine-tuning learning rate by determining how quickly the model updates the concepts it has learned.
Changing hyperparameters values from default may either increase or decrease the quality of the tuned model. Several rounds of experiments might be needed to find better models.
Smaller models tend to take more epochs to train.
Models can be badly trained for a number of reasons: a small dataset, large learning rates, or an insufficient number of epochs. In these cases, you might have to retrain the model with a different set of hyperparameters.
In the future, we will allow logging the training process to W&B for inspection.
Updated 8 months ago