k
-th token not sampled. A lower value can improve quality by removing the long tail of less likely tokens and making it less likely to go off topic.
1
. As a corollary, if you are modifying temperature, make sure that you set top-p to the default of 1
**
Modifying both temperature and top-p from their defaults at the same time can lead to undesired results. The model first performs top-p sampling to narrow down the word selection, and then applies the temperature to adjust the remaining probabilities. This process can produce outputs that become either too constrained or too random, depending on your chosen values.
To strike a balance between control and diversity in text generation, it’s generally recommended to modify top-p values or temperature values based on your desired outcome. Experimenting with different values and observing the generated outputs can help you determine which parameter suits your specific needs.
best_of
best_of
defaults to 1
, but it accepts any integer between 1
and 5
In the context of language models, best_of
is usually used in the context of beam search, which is an algorithm used for generating sequences of words. Beam search explores multiple possible word sequences and selects the most promising ones based on a scoring mechanism.
In beam search, the best_of
value refers to the number of top candidates that get considered at each step of the generation process. For example, if you set best_of
to 3
, the algorithm considers the top 3 candidates at each step and proceeds with further exploration.
The higher the best_of
value, the more candidates the LLM considers, which can increase the quality and accuracy of the output as well as the computational time and cost of the generation.