Model Parameters


Our models learn language by reading text from the internet. Given a sentence, such as I like to play with legos, the model is asked to repeatedly predict what the next token [?] is:

**I [?]

I like [?]

I like to [?]

I like to play [?]

I like to play with [?]**

The model learns that the word “to” is quite likely to follow the word “like” in English, and that the word “play” is likely to follow the word “with” and so on.

Likelihood refers to the probability that a given sequence of tokens (e.g., words or characters) will be generated by the model. The likelihood of a token can be thought of as a number (typically between -15 and 0) that quantifies a model's level of surprise that this token was used in a sentence.

If a token has a low likelihood, it means the model did not expect this token to be used. Conversely, if a token has a high likelihood, the model was confident that it would be used.

The likelihood score can be used to evaluate the quality of the model's output, representing the model's confidence that a given output is the most likely sequence of tokens given the input prompt and the target data.


The same prompt may yield different outputs each time you hit "generate" and so, temperature is a parameter used to control the randomness and diversity of the generated output.

The temperature scale ranges from 0 to 1. A temperature of 1.0 means that the model is generating outputs with its normal level of confidence, based on the probabilities it has learned during training.

A higher temperature will increase the randomness of the output, allowing for more diverse and unexpected results. A lower temperature will make the model more conservative in its predictions, generating outputs that are more likely to align with what it has learned during training.

Most people will find that a temperature of 1 is a good starting point.

Temperature defaults to .7 if not specified as an exact parameter

With longer prompts, the model becomes more confident in its predictions, so you can raise the temperature higher for a diverse and creative output without the output being too off topic. In contrast, using high temperatures on short prompts can lead to outputs being very unstable.

We recommend using lower temperature values for tasks like classification, entity extraction, or question answering, and use higher temperature values for tasks like content or idea generation.

Top-k & top-p

Top-k sampling and top-p sampling are methods for picking the output tokens by controlling the quality and diversity of the generated output.


Top-k is currently not an active parameter but if you believe your outputs could be improved with this parameter, contact your Writer representative

Top-k defaults to 40, but it will accept any integer between 1 and 50,400.

Top-k refers to selecting the k tokens with the highest predicted probabilities and then randomly selecting one of these k tokens to generate the next word. This method is used to generate outputs that are more likely to be accurate, but may be less diverse.

Top-k refers to the number of tokens that will be sampled, sorted by probability with all tokens beneath the k'th token ‌not sampled. A lower value can improve quality by removing the long tail of less likely tokens and making it less likely to go off topic.


Top-p defaults to 1 but accepts any number between 0 and 1.

Top-p refers to selecting the top p% of the most likely tokens (e.g., words or characters) based on the predicted probabilities of the model, and then randomly selecting one of the top p tokens to generate the next word.

Top-p is an alternative way of controlling the randomness of the generated text. Generally, top-p will provide better control where your model is expected to generate text with accuracy and correctness.

When modifying top-p, make sure that temperature is set to 1. As a corollary, if you are modifying tempreature, make sure that top-p is set to the default of 1

Simultaneously modifying temperature and top-p from their defaults may lead to undesired results. When both top-p and temperature are applied together, the top-p sampling is performed first to narrow down the selection of words, and then the temperature is applied to reweight the remaining probabilities. This can result in outputs that are overly constrained or overly random, depending on the specific values used.

To strike a balance between control and diversity in text generation, it is generally recommended to modify top-p values or temperature values based on your desired outcome. Experimenting with different values and observing the generated outputs can help you determine which parameter suits your specific needs.


best_of defaults to 1, but it will accept any integer between 1 and 5

In the context of language models, "best_of" is usually used in the context of "beam search," which is an algorithm used for generating sequences of words. Beam search explores multiple possible word sequences and selects the most promising ones based on a scoring mechanism.

In beam search, the "best_of" value refers to the number of top candidates that will be considered at each step of the generation process. For example, if "best_of" is set to 3, the algorithm will consider the top 3 candidates at each step and proceed with further exploration.

The higher the "best_of" value, the more candidates are considered, which can increase the quality and accuracy of the output as well as the computational time and cost of the generation.