You need an API key to access the Writer API. Get an API key by following the steps in the API quickstart.We recommend setting the API key as an environment variable in a
.env
file with the name WRITER_API_KEY
.Chat completion versus text generation
The chat completion endpoint is similar to the text generation endpoint, but it is designed to handle conversations between a user and an LLM. The chat completion endpoint can generate single messages, or create more complex conversations between a user and an LLM. The text generation endpoint is designed to generate a single text response based on a given prompt. Additionally, the chat completion endpoint offers tool calling, which you can use to access domain-specific LLMs, Knowledge Graphs, and custom functions.Endpoint overview
URL:POST https://api.writer.com/v1/chat
Using the
/chat
endpoint results in charges for model usage. See the pricing page for more information.Request body
Below are the required and commonly used optional parameters for the text generation endpoint.Parameter | Type | Description |
---|---|---|
model | string | Required. The ID of the model to use for the chat completion. Can be palmyra-x4 , palmyra-x5 , palmyra-fin , palmyra-med , palmyra-creative , or palmyra-x-003-instruct . |
messages | array | Required. The conversation history. |
messages[].role | string | Required. The role of the message sender. Can be user , assistant , system , or tool . system messages are system prompts, used to provide instructions to the model. tool messages are the result of a tool call, and contain the output of the tool call. |
messages[].content | string | Required. The content of the message. |
temperature | float | Temperature influences the randomness in generated text. Defaults to 1 . Increase the value for more creative responses, and decrease the value for more predictable responses. |
stream | Boolean | A Boolean value that indicates whether to stream the response. Defaults to false . |
Response parameters
Non-streaming response
If you set thestream
parameter to false
, the response is delivered as a single JSON object. It contains several parameters describing the response, including the choices
array, which contains the generated text.
Parameter | Type | Description |
---|---|---|
model | string | The ID of the model used to generate the response. |
choices | array | An array containing one object with the generated text and additional information. |
choices[0].message.content | string | The generated text. |
non-streaming response
Streaming response
If you set thestream
parameter to true
, the response is delivered as server-sent events. The event contains several parameters. The content of the chunk is in the choices[0].delta.content
parameter.
Parameter | Type | Description |
---|---|---|
choices[0].delta.content | string | The content of the chunk. |
streaming response
Sample application
The following sample application uses the Python and JavaScript SDKs to create a command-line chatbot. The application asks the user for input, passes the conversation history to the LLM, and streams the response from the LLM. It loops until the user enters the messageexit
.
Set a system prompt
To guide the behavior of the assistant, you can set a system prompt by adding a message with the rolesystem
to the messages
array.
For example, you can set a system prompt to have the assistant respond in a certain tone or style, or to provide additional context for the conversation. Here’s a system prompt that directs the assistant to be casual and use emojis in its responses:
Stream chat responses
The sample application streams the responses from the LLM. Streaming improves the user experience, showing the input as it is generated and reducing the latency of the final response. Below are the code snippets from the sample application to stream the chat responses. They use thestream
helper method of the chat
endpoint to print the chat responses to the console in real time.
The stream
method also collects the final response and returns it so it can be added to the conversation history.
Full application
The following is the complete sample application that uses the functions defined above to stream the chat responses. The application sets a system prompt, asks the user for an initial message, and then enters a loop to handle the conversation between the user and the LLM. It adds the user’s message to the conversation history and streams the response from the LLM to the user. The loop continues until the user enters the messageexit
.
Best practices
Follow these best practices to ensure that your chatbot behaves as expected:- Use system messages: Including a system message can guide the behavior of the assistant, setting expectations for its tone and responsiveness.
- Maintain context: Ensure that all relevant parts of the conversation are included in the
messages
array to maintain context, as the model doesn’t retain memory of past interactions. - Handle errors gracefully: Implement error handling for various HTTP status codes and API-specific errors such as rate limits or malformed requests.
- Manage conversational flow: Regularly review the conversation’s context and adjust it to keep interactions relevant and concise, especially under the model’s token limit.