Generate chat completions

The chat completion endpoint allows you to create a conversation between a user and an AI-assisted chat model. This guide introduces the chat completion endpoint and shows how to create a multi-turn conversation with an LLM, where the conversation history is preserved so the model understands the context of the conversation.

You need an API key to access the Writer API. Get an API key by following the steps in the API quickstart.We recommend setting the API key as an environment variable in a .env file with the name WRITER_API_KEY.

Chat completion versus text generation

The chat completion endpoint is similar to the text generation endpoint, but it is designed to handle conversations between a user and an LLM. The chat completion endpoint can generate single messages, or create more complex conversations between a user and an LLM. The text generation endpoint is designed to generate a single text response based on a given prompt. Additionally, the chat completion endpoint offers tool calling, which you can use to access domain-specific LLMs, Knowledge Graphs, and custom functions.

Endpoint overview

URL: POST https://api.writer.com/v1/chat

Using the /chat endpoint results in charges for model usage. See the pricing page for more information.

curl --location 'https://api.writer.com/v1/chat' \
--header 'Content-Type: application/json' \
--header "Authorization: Bearer $WRITER_API_KEY" \
--data '{
    "model": "palmyra-x5",
    "temperature": 1.5,
    "messages": [
        {
            "role": "user",
            "content": "You are an expert at writing concise product descriptions for an E-Commerce Retailer"
        },
        {
            "role": "assistant",
            "content": "Okay, great I can help write these descriptions. Do you have a specific product in mind?"
        },
        {
            "role": "user",
            "content": "Please write a one sentence product description for a cozy, stylish sweater suitable for both casual and formal occasions"
        }
    ]
}'

Request body

Below are the required and commonly used optional parameters for the text generation endpoint.

Parameter	Type	Description
`model`	string	Required. The ID of the model to use for the chat completion. Can be `palmyra-x4`, `palmyra-x5`, `palmyra-fin`, `palmyra-med`, `palmyra-creative`, or `palmyra-x-003-instruct`.
`messages`	array	Required. The conversation history.
`messages[].role`	string	Required. The role of the message sender. Can be `user`, `assistant`, `system`, or `tool`. `system` messages are system prompts, used to provide instructions to the model. `tool` messages are the result of a tool call, and contain the output of the tool call.
`messages[].content`	string	Required. The content of the message.
`temperature`	float	Temperature influences the randomness in generated text. Defaults to `1`. Increase the value for more creative responses, and decrease the value for more predictable responses.
`stream`	Boolean	A Boolean value that indicates whether to stream the response. Defaults to `false`.

See the chat completion endpoint reference for more information on the request body and the additional parameters you can use to control the conversation.

Response parameters

Non-streaming response

If you set the stream parameter to false, the response is delivered as a single JSON object. It contains several parameters describing the response, including the choices array, which contains the generated text.

Parameter	Type	Description
`model`	string	The ID of the model used to generate the response.
`choices`	array	An array containing one object with the generated text and additional information.
`choices[0].message.content`	string	The generated text.

See the full list of response parameters in the chat completion endpoint reference.

non-streaming response

{
  "id": "f7aed821-58cf-4210-9d73-538b2cb8ae44",
  "object": "chat.completion",
  "choices": [
    {
      "index": 0,
      "finish_reason": "stop",
      "message": {
        "content": "Elevate your wardrobe with this versatile, cozy sweater that seamlessly transitions from casual days to formal evenings.",
        "role": "assistant",
        "tool_calls": null,
        "graph_data": {
          "sources": null,
          "status": null,
          "subqueries": null
        },
        "llm_data": null,
        "image_data": null,
        "refusal": null
      },
      "logprobs": null
    }
  ],
  "created": 1741891192,
  "model": "palmyra-x5",
  "usage": {
    "prompt_tokens": 96,
    "total_tokens": 118,
    "completion_tokens": 22,
    "prompt_token_details": null,
    "completion_tokens_details": null
  },
  "system_fingerprint": "v1",
  "service_tier": null
}

Streaming response

If you set the stream parameter to true, the response is delivered as server-sent events. The event contains several parameters. The content of the chunk is in the choices[0].delta.content parameter.

Parameter	Type	Description
`choices[0].delta.content`	string	The content of the chunk.

streaming response

data: {'id': '3bb941e2-4dab-4ceb-b4e3-45a429f40c72',
 'object': 'chat.completion.chunk',
 'choices': [{'index': 0,
   'finish_reason': None,
   'message': {'content': 'This',
    'role': 'assistant',
    'tool_calls': None,
    'graph_data': {'sources': None, 'status': None, 'subqueries': None},
    'llm_data': None,
    'image_data': None,
    'refusal': None},
   'delta': {'content': 'This',
    'role': 'assistant',
    'tool_calls': None,
    'graph_data': {'sources': None, 'status': None, 'subqueries': None},
    'llm_data': None,
    'image_data': None,
    'refusal': None},
   'logprobs': None}],
 'created': 1741891257,
 'model': 'palmyra-x5',
 'usage': None,
 'system_fingerprint': 'v1',
 'service_tier': None}

Sample application

The following sample application uses the Python and JavaScript SDKs to create a command-line chatbot. The application asks the user for input, passes the conversation history to the LLM, and streams the response from the LLM. It loops until the user enters the message exit.

Set a system prompt

To guide the behavior of the assistant, you can set a system prompt by adding a message with the role system to the messages array. For example, you can set a system prompt to have the assistant respond in a certain tone or style, or to provide additional context for the conversation. Here’s a system prompt that directs the assistant to be casual and use emojis in its responses:

system_prompt = "You are a helpful assistant that responds in a casual, friendly tone and uses emojis in your responses."

messages = [
  {"role": "system", "content": system_prompt},
]

Learn about prompting best practices to help you create effective system prompts.

Stream chat responses

The sample application streams the responses from the LLM. Streaming improves the user experience, showing the input as it is generated and reducing the latency of the final response. Below are the code snippets from the sample application to stream the chat responses. They use the stream helper method of the chat endpoint to print the chat responses to the console in real time. The stream method also collects the final response and returns it so it can be added to the conversation history.

# Stream the chat response to the user using the `stream` helper method
with client.chat.stream(messages=messages, model="palmyra-x5") as stream:
    for event in stream:
        # Check if the event is a content delta, which contains this chunk of the chat response
        if event.type == "content.delta":
            print(event.delta, end="", flush=True)
# Collect the full response from the stream
completion = stream.get_final_completion().choices[0].message.content

Full application

The following is the complete sample application that uses the functions defined above to stream the chat responses. The application sets a system prompt, asks the user for an initial message, and then enters a loop to handle the conversation between the user and the LLM. It adds the user’s message to the conversation history and streams the response from the LLM to the user. The loop continues until the user enters the message exit.

from writerai import Writer

# Initialize the client. If you don't pass the `api_key` parameter,
# the client looks for the `WRITER_API_KEY` environment variable.
client = Writer()

end = False
system_prompt = "You are a helpful assistant that responds in a casual, friendly tone and uses emojis in your responses."

# Ask the user for an initial message
input_message = "\nEnter a message for the assistant. Type 'exit' to end the conversation. > "
initial_message = input(input_message)

# Add the user's message to the conversation history
messages = [{"role": "system", "content": system_prompt}, {"role": "user", "content": initial_message}]

# Main loop to handle the conversation.
while not end:
    # Stream the chat response to the user using the `stream` helper method
    with client.chat.stream(messages=messages, model="palmyra-x5") as stream:
        for event in stream:
            # Check if the event is a content delta, which contains this chunk of the chat response
            if event.type == "content.delta":
                print(event.delta, end="", flush=True)
    # Collect the full response from the stream and add it to the conversation history
    full_message = stream.get_final_completion().choices[0].message.content
    messages.append({"role": "assistant", "content": full_message})
    new_message = input(input_message)
    messages.append({"role": "user", "content": new_message})
    if new_message == "exit":
        end = True

Best practices

Follow these best practices to ensure that your chatbot behaves as expected:

Use system messages: Including a system message can guide the behavior of the assistant, setting expectations for its tone and responsiveness.
Maintain context: Ensure that all relevant parts of the conversation are included in the messages array to maintain context, as the model doesn’t retain memory of past interactions.
Handle errors gracefully: Implement error handling for various HTTP status codes and API-specific errors such as rate limits or malformed requests.
Manage conversational flow: Regularly review the conversation’s context and adjust it to keep interactions relevant and concise, especially under the model’s token limit.

Next steps

Now that you’ve created a chatbot, learn how to add tool calling to your application to enhance the functionality with domain-specific LLMs, Knowledge Graphs, and custom functions.

Getting started

Core concepts

Models and pricing

Chat completions

No-code agents

Knowledge Graphs

Tool calling

Additional capabilities

Integrations

Supervise

Security and compliance

Resources

Chat completion versus text generation

Endpoint overview

Request body

Response parameters

Non-streaming response

Streaming response

Sample application

Set a system prompt

Stream chat responses

Full application

Best practices

Next steps

Getting started

Core concepts

Models and pricing

Chat completions

No-code agents

Knowledge Graphs

Tool calling

Additional capabilities

Integrations

Supervise

Security and compliance

Resources

​Chat completion versus text generation

​Endpoint overview

​Request body

​Response parameters

​Non-streaming response

​Streaming response

​Sample application

​Set a system prompt

​Stream chat responses

​Full application

​Best practices

​Next steps

Chat completion versus text generation

Endpoint overview

Request body

Response parameters

Non-streaming response

Streaming response

Sample application

Set a system prompt

Stream chat responses

Full application

Best practices

Next steps