Likelihood

WRITER models learn language by reading text from the internet. Given a sentence such as “I like to play with Legos”, the model repeatedly predicts what the next token is:

\*\*I [?]

I like [?]

I like to [?]

I like to play [?]

I like to play with [?]\*\*
The model learns that the word “to” is quite likely to follow the word “like” in English, and that the word “play” is likely to follow the word “with” and so on. Likelihood refers to the probability that a given sequence of tokens, which can be words or characters, gets generated by the model. You can think of the likelihood of a token as a number that quantifies a model’s level of surprise that this token appeared in a sentence. If a token has a low likelihood, it means the model didn’t expect this token to appear. Conversely, if a token has a high likelihood, the model was confident that it would appear. You can use the likelihood score to evaluate the quality of the model’s output, representing the model’s confidence that a given output is the most likely sequence of tokens given the input prompt and the target data.

Temperature

The same prompt may yield different outputs each time you hit “generate” and so, temperature is a parameter used to control the randomness and diversity of the generated output. The temperature scale ranges from 0 to 1. A temperature of 1.0 means that the model is generating outputs with its normal level of confidence, based on the probabilities it has learned during training. A higher temperature increases the randomness of the output, allowing for more diverse and unexpected results. A lower temperature makes the model more conservative in its predictions, generating outputs that are more likely to align with what it has learned during training. Most people find that a temperature between 0.7 and 1 is a good starting point. Temperature defaults to 0.7 if not specified as an exact parameter With longer prompts, the model becomes more confident in its predictions, so you can raise the temperature higher for a diverse and creative output without the output being too off topic. In contrast, using high temperatures on short prompts can lead to unstable outputs. You can use lower temperature values for tasks like classification, entity extraction, or question answering, and use higher temperature values for tasks like content or idea generation.

Top-k and top-p

Top-k sampling and top-p sampling are methods for picking the output tokens by controlling the quality and diversity of the generated output.

Top-k

Top-k is currently not an active parameter but if you believe your outputs could improve with this parameter, contact your Writer representative Top-k defaults to 40, but it accepts any integer between 1 and 50,400. Top-k refers to selecting the k tokens with the highest predicted probabilities and then randomly selecting one of these k tokens to generate the next word. This method generates outputs that are more likely to be accurate, but may be less diverse. Top-k refers to the number of tokens that get sampled, sorted by probability with all tokens beneath the k-th token not sampled. A lower value can improve quality by removing the long tail of less likely tokens and making it less likely to go off topic.

Top-p

Top-p defaults to 1 but accepts any number between 0 and 1. Top-p refers to selecting the top p% of the most likely tokens based on the predicted probabilities of the model, and then randomly selecting one of the top p tokens to generate the next word. Top-p is an alternative way of controlling the randomness of the generated text. Generally, top-p provides better control where your model generates text with accuracy and correctness. **When modifying top-p, make sure that temperature is 1. As a corollary, if you are modifying temperature, make sure that you set top-p to the default of 1 ** Modifying both temperature and top-p from their defaults at the same time can lead to undesired results. The model first performs top-p sampling to narrow down the word selection, and then applies the temperature to adjust the remaining probabilities. This process can produce outputs that become either too constrained or too random, depending on your chosen values. To strike a balance between control and diversity in text generation, it’s generally recommended to modify top-p values or temperature values based on your desired outcome. Experimenting with different values and observing the generated outputs can help you determine which parameter suits your specific needs.

best_of

best_of defaults to 1, but it accepts any integer between 1 and 5 In the context of language models, best_of is usually used in the context of beam search, which is an algorithm used for generating sequences of words. Beam search explores multiple possible word sequences and selects the most promising ones based on a scoring mechanism. In beam search, the best_of value refers to the number of top candidates that get considered at each step of the generation process. For example, if you set best_of to 3, the algorithm considers the top 3 candidates at each step and proceeds with further exploration. The higher the best_of value, the more candidates the LLM considers, which can increase the quality and accuracy of the output as well as the computational time and cost of the generation.