The chat completion endpoint allows you to create a conversation between a user and an AI-assisted chat model.

This guide introduces the chat completion endpoint and shows how to create a multi-turn conversation with an LLM, where the conversation history is preserved so the model understands the context of the conversation.

You need an API key to access the Writer API. Get an API key by following the steps in the API quickstart.

We recommend setting the API key as an environment variable in a .env file with the name WRITER_API_KEY.

Chat completion versus text generation

The chat completion endpoint is similar to the text generation endpoint, but it is designed to handle conversations between a user and an LLM.

The chat completion endpoint can generate single messages, or create more complex conversations between a user and an LLM. The text generation endpoint is designed to generate a single text response based on a given prompt.

Additionally, the chat completion endpoint offers tool calling, which you can use to access domain-specific LLMs, Knowledge Graphs, and custom functions.

Endpoint overview

URL: POST https://api.writer.com/v1/chat

Using the /chat endpoint results in charges for model usage. See the pricing page for more information.

curl --location 'https://api.writer.com/v1/chat' \
--header 'Content-Type: application/json' \
--header "Authorization: Bearer $WRITER_API_KEY" \
--data '{
    "model": "palmyra-x-004",
    "temperature": 1.5,
    "messages": [
        {
            "role": "user",
            "content": "You are an expert at writing concise product descriptions for an E-Commerce Retailer"
        },
        {
            "role": "assistant",
            "content": "Okay, great I can help write these descriptions. Do you have a specific product in mind?"
        },
        {
            "role": "user",
            "content": "Please write a one sentence product description for a cozy, stylish sweater suitable for both casual and formal occasions"
        }
    ]
}'

Request body

Below are the required and commonly used optional parameters for the text generation endpoint.

ParameterTypeDescription
modelstringRequired. The ID of the model to use for the chat completion. Can only be palmyra-x-004.
messagesarrayRequired. The conversation history.
messages[].rolestringRequired. The role of the message sender. Can be user or assistant.
messages[].contentstringRequired. The content of the message.
temperaturefloatTemperature influences the randomness in generated text. Defaults to 1. Increase the value for more creative responses, and decrease the value for more predictable responses.
streamBooleanA Boolean value that indicates whether to stream the response. Defaults to false.

See the chat completion endpoint reference for more information on the request body and the additional parameters you can use to control the conversation.

Response parameters

Non-streaming response

If you set the stream parameter to false, the response is delivered as a single JSON object. It contains several parameters describing the response, including the choices array, which contains the generated text.

ParameterTypeDescription
modelstringThe ID of the model used to generate the response.
choicesarrayAn array containing one object with the generated text and additional information.
choices[0].message.contentstringThe generated text.

See the full list of response parameters in the chat completion endpoint reference.

non-streaming response
{
  "id": "f7aed821-58cf-4210-9d73-538b2cb8ae44",
  "object": "chat.completion",
  "choices": [
    {
      "index": 0,
      "finish_reason": "stop",
      "message": {
        "content": "Elevate your wardrobe with this versatile, cozy sweater that seamlessly transitions from casual days to formal evenings.",
        "role": "assistant",
        "tool_calls": null,
        "graph_data": {
          "sources": null,
          "status": null,
          "subqueries": null
        },
        "llm_data": null,
        "image_data": null,
        "refusal": null
      },
      "logprobs": null
    }
  ],
  "created": 1741891192,
  "model": "palmyra-x-004",
  "usage": {
    "prompt_tokens": 96,
    "total_tokens": 118,
    "completion_tokens": 22,
    "prompt_token_details": null,
    "completion_tokens_details": null
  },
  "system_fingerprint": "v1",
  "service_tier": null
}

Streaming response

If you set the stream parameter to true, the response is delivered as server-sent events. The event contains several parameters. The content of the chunk is in the choices[0].delta.content parameter.

ParameterTypeDescription
choices[0].delta.contentstringThe content of the chunk.
streaming response
data: {'id': '3bb941e2-4dab-4ceb-b4e3-45a429f40c72',
 'object': 'chat.completion.chunk',
 'choices': [{'index': 0,
   'finish_reason': None,
   'message': {'content': 'This',
    'role': 'assistant',
    'tool_calls': None,
    'graph_data': {'sources': None, 'status': None, 'subqueries': None},
    'llm_data': None,
    'image_data': None,
    'refusal': None},
   'delta': {'content': 'This',
    'role': 'assistant',
    'tool_calls': None,
    'graph_data': {'sources': None, 'status': None, 'subqueries': None},
    'llm_data': None,
    'image_data': None,
    'refusal': None},
   'logprobs': None}],
 'created': 1741891257,
 'model': 'palmyra-x-004',
 'usage': None,
 'system_fingerprint': 'v1',
 'service_tier': None}

Sample application

The following sample application uses the Python and JavaScript SDKs to create a command-line chatbot.

The application asks the user for input, passes the conversation history to the LLM, and displays the response from the LLM. It loops until the user enters the message exit.

Stream chat responses

The sample application streams the responses from the LLM. Streaming improves the user experience, showing the input as it is generated and reducing the latency of the final response.

Below are the functions used in the sample application to stream the chat responses. They iterate over the chat response chunks and print them to the console. They also collect the full text of the response and return it so it can be added to the conversation history.

def stream_chat_response(message):
  """
  Stream the chat response from the LLM and return the final response.
  """
  output = ""
  for chunk in message:
      if chunk.choices[0].delta.content:
            output += chunk.choices[0].delta.content
            print(chunk.choices[0].delta.content, end="", flush=True)
  return output

Full application

The following is the complete sample application that uses the functions defined above to stream the chat responses.

The application asks the user for an initial message, and then enters a loop to handle the conversation between the user and the LLM. It adds the user’s message to the conversation history and streams the response from the LLM to the user.

The loop continues until the user enters the message exit.

from writerai import Writer

# Initialize the client. If you don't pass the `api_key` parameter,
# the client looks for the `WRITER_API_KEY` environment variable.
client = Writer()

end = False
input_message = "\nEnter a message for the assistant. Type 'exit' to end the conversation. > "
initial_message = input(input_message)
messages = [{"role": "user", "content": initial_message}]

def stream_chat_response(message):
  """
  Stream the chat response from the LLM and return the final response.
  """
  output = ""
  for chunk in message:
      if chunk.choices[0].delta.content:
            output += chunk.choices[0].delta.content
            print(chunk.choices[0].delta.content, end="", flush=True)
  return output

# Main loop to handle the conversation.
while not end:
    chat_response = client.chat.chat(messages=messages, model="palmyra-x-004", stream=True)
    message = stream_chat_response(chat_response)
    messages.append({"role": "assistant", "content": message})
    new_message = input(input_message)
    messages.append({"role": "user", "content": new_message})
    if new_message == "exit":
        end = True

Best practices

Follow these best practices to ensure that your chatbot behaves as expected:

  • Use system messages: Including a system message can guide the behavior of the assistant, setting expectations for its tone and responsiveness.
  • Maintain context: Ensure that all relevant parts of the conversation are included in the messages array to maintain context, as the model doesn’t retain memory of past interactions.
  • Handle errors gracefully: Implement error handling for various HTTP status codes and API-specific errors such as rate limits or malformed requests.
  • Manage conversational flow: Regularly review the conversation’s context and adjust it to keep interactions relevant and concise, especially under the model’s token limit.

Next steps

Now that you’ve created a chatbot, learn how to add tool calling to your application to enhance the functionality with domain-specific LLMs, Knowledge Graphs, and custom functions.