Ask a question to an LLM
You can use the text generation endpoint to ask a Palmyra LLM a specific question. It generates a single text response based on a given prompt.
You need an API key to access the Writer API. Get an API key by following the steps in the API quickstart.
We recommend setting the API key as an environment variable in a .env
file with the name WRITER_API_KEY
.
Text generation vs. chat completion
The text generation endpoint is appropriate when you need to generate a single text response based on a given prompt, or when you want to ask a specific LLM a question.
The chat completion endpoint can generate single messages, or create more complex conversations between a user and an LLM. Additionally, the chat completion endpoint offers tool calling, which you can use to access domain-specific LLMs, Knowledge Graphs, and custom functions.
Endpoint overview
URL: POST https://api.writer.com/v1/completions
Using the /completions
endpoint results in charges for model usage. See the pricing page for more information.
Request body
Below are the required and commonly used optional parameters for the text generation endpoint.
Parameter | Type | Description |
---|---|---|
model | string | Required. The ID of the model to use for text generation. |
prompt | string | Required. The prompt to generate text from. |
max_tokens | int | The maximum number of tokens to generate for the response. Defaults to 100 . |
temperature | float | Temperature influences the randomness in generated text. Defaults to 1 . Increase the value for more creative responses, and decrease the value for more predictable responses. |
stream | Boolean | A Boolean value that indicates whether to stream the response. Defaults to false . |
See the full list of available parameters in the text generation endpoint reference.
Response parameters
Non-streaming response
If you set the stream
parameter to false
, the response is a single JSON object with the following parameters:
Parameter | Type | Description |
---|---|---|
model | string | The ID of the model used to generate the response. |
choices | array | An array of choices objects. |
choices[0].text | string | The generated text. |
choices[0].log_probs | object | The log probabilities of the tokens in the generated text. |
Streaming response
If you set the stream
parameter to true
, the response is delivered as server-sent events with the following parameters:
Parameter | Description |
---|---|
value | The content of the chunk. |
Example request to a specific LLM
The examples below generate a single message from the palmyra-med
model, using the prompt “How can I treat a cold?”
Streaming response
The text generation endpoint supports streaming responses. The response comes in chunks until the entire response finishes.
Streaming responses are useful when you want to display the generated text in real-time, or when you want to stream the response to a client, rather than waiting for the entire response to finish.
Non-streaming response
For non-streaming responses, the response returns as a single JSON object after the entire response is complete. The text is in the choices[0].text
field.
Next steps
Now that you’ve generated text with a Palmyra LLM, try out the following:
- Create a chat with an AI assistant using the chat completion endpoint
- Learn more about the tool calling feature of the chat completion endpoint
Was this page helpful?