Generate chat completions
The chat completion endpoint allows you to create a conversation between a user and an AI-assisted chat model.
This guide introduces the chat completion endpoint and shows how to create a multi-turn conversation with an LLM, where the conversation history is preserved so the model understands the context of the conversation.
You need an API key to access the Writer API. Get an API key by following the steps in the API quickstart.
We recommend setting the API key as an environment variable in a .env
file with the name WRITER_API_KEY
.
Chat completion versus text generation
The chat completion endpoint is similar to the text generation endpoint, but it is designed to handle conversations between a user and an LLM.
The chat completion endpoint can generate single messages, or create more complex conversations between a user and an LLM. The text generation endpoint is designed to generate a single text response based on a given prompt.
Additionally, the chat completion endpoint offers tool calling, which you can use to access domain-specific LLMs, Knowledge Graphs, and custom functions.
Endpoint overview
URL: POST https://api.writer.com/v1/chat
Using the /chat
endpoint results in charges for model usage. See the pricing page for more information.
Request body
Below are the required and commonly used optional parameters for the text generation endpoint.
Parameter | Type | Description |
---|---|---|
model | string | Required. The ID of the model to use for the chat completion. Can only be palmyra-x-004 . |
messages | array | Required. The conversation history. |
messages[].role | string | Required. The role of the message sender. Can be user or assistant . |
messages[].content | string | Required. The content of the message. |
temperature | float | Temperature influences the randomness in generated text. Defaults to 1 . Increase the value for more creative responses, and decrease the value for more predictable responses. |
stream | Boolean | A Boolean value that indicates whether to stream the response. Defaults to false . |
See the chat completion endpoint reference for more information on the request body and the additional parameters you can use to control the conversation.
Response parameters
Non-streaming response
If you set the stream
parameter to false
, the response is delivered as a single JSON object. It contains several parameters describing the response, including the choices
array, which contains the generated text.
Parameter | Type | Description |
---|---|---|
model | string | The ID of the model used to generate the response. |
choices | array | An array containing one object with the generated text and additional information. |
choices[0].message.content | string | The generated text. |
See the full list of response parameters in the chat completion endpoint reference.
Streaming response
If you set the stream
parameter to true
, the response is delivered as server-sent events. The event contains several parameters. The content of the chunk is in the choices[0].delta.content
parameter.
Parameter | Type | Description |
---|---|---|
choices[0].delta.content | string | The content of the chunk. |
Sample application
The following sample application uses the Python and JavaScript SDKs to create a command-line chatbot.
The application asks the user for input, passes the conversation history to the LLM, and displays the response from the LLM. It loops until the user enters the message exit
.
Stream chat responses
The sample application streams the responses from the LLM. Streaming improves the user experience, showing the input as it is generated and reducing the latency of the final response.
Below are the functions used in the sample application to stream the chat responses. They iterate over the chat response chunks and print them to the console. They also collect the full text of the response and return it so it can be added to the conversation history.
Full application
The following is the complete sample application that uses the functions defined above to stream the chat responses.
The application asks the user for an initial message, and then enters a loop to handle the conversation between the user and the LLM. It adds the user’s message to the conversation history and streams the response from the LLM to the user.
The loop continues until the user enters the message exit
.
Best practices
Follow these best practices to ensure that your chatbot behaves as expected:
- Use system messages: Including a system message can guide the behavior of the assistant, setting expectations for its tone and responsiveness.
- Maintain context: Ensure that all relevant parts of the conversation are included in the
messages
array to maintain context, as the model doesn’t retain memory of past interactions. - Handle errors gracefully: Implement error handling for various HTTP status codes and API-specific errors such as rate limits or malformed requests.
- Manage conversational flow: Regularly review the conversation’s context and adjust it to keep interactions relevant and concise, especially under the model’s token limit.
Next steps
Now that you’ve created a chatbot, learn how to add tool calling to your application to enhance the functionality with domain-specific LLMs, Knowledge Graphs, and custom functions.
Was this page helpful?