Language models understand "tokens" rather than characters or bytes. Each token represents a single unit of meaning, like a word or a group of words.

For example, consider the sentence "I love to play soccer." The model would break this sentence down into tokens like "I," "love," "to," "play," and "soccer." These tokens are then used by the model to make predictions about what might come next in the sentence, like "with" or "on."

The idea is that by breaking the text into smaller pieces, the model can better understand the meaning and context of each word. This helps the model make more accurate predictions and generate more natural language.

Think of it like cooking a meal. If you have all the ingredients chopped up into small pieces, it's easier to mix and cook them together to make a delicious dish. Tokens in large language models work in a similar way, making it easier for the model to process and understand the input text.

Tokenization is the process of breaking input text down into smaller units. Depending on the tokenization method used by the large language model, a single word can be represented by one token in simple text, while in others, a single word may be broken down into multiple sub-word tokens in complex texts.

Our generation models support up to max_tokens = 1024 and our vocabulary of tokens is created using Byte Pair Encoding.