> ## Documentation Index
> Fetch the complete documentation index at: https://dev.writer.com/llms.txt
> Use this file to discover all available pages before exploring further.

# Generate chat completions

> Create multi-turn conversations with AI using the chat completion API. Preserve conversation context and use tool calling for advanced features.

<Warning>
  **Deprecation notice**: The following models are deprecated and will be removed on **July 13, 2026**: `palmyra-x-003-instruct`, `palmyra-vision`, `palmyra-med`, `palmyra-fin`, and `palmyra-creative`.

  **Migration path**: Use [`palmyra-x5`](/home/models#palmyra-x5) as the replacement for all deprecated models. Palmyra X5 supports a 1M-token context window and covers general-purpose, financial, medical, and creative use cases. For vision workloads, use [chat with images](/home/chat-with-images) with Palmyra X5 instead of `palmyra-vision`. See the [deprecation policy](/home/models#deprecation-policy) for more information.
</Warning>

The [chat completion endpoint](/api-reference/completion-api/chat-completion) allows you to create a conversation between a user and an AI-assisted chat model.

This guide introduces the chat completion endpoint and shows how to create a multi-turn conversation with an LLM, where the conversation history is preserved so the model understands the context of the conversation.

<Note>
  You need an API key to access the Writer API. Get an API key by following the steps in the [API quickstart](/home/quickstart).

  We recommend setting the API key as an environment variable in a `.env` file with the name `WRITER_API_KEY`.
</Note>

## Chat completion versus text generation

The chat completion endpoint is similar to the [text generation endpoint](/home/text-generation), but it is designed to handle conversations between a user and an LLM.

The chat completion endpoint can generate single messages, or create more complex conversations between a user and an LLM. The text generation endpoint is designed to generate a single text response based on a given prompt.

Additionally, the chat completion endpoint offers [tool calling](/home/tool-calling), which you can use to access domain-specific LLMs, Knowledge Graphs, and custom functions.

## Endpoint overview

**URL:** `POST https://api.writer.com/v1/chat`

<Warning>
  Using the `/chat` endpoint results in charges for **model usage**. See the [pricing page](/home/pricing) for more information.
</Warning>

<CodeGroup>
  ```bash cURL theme={null}
  curl --location 'https://api.writer.com/v1/chat' \
  --header 'Content-Type: application/json' \
  --header "Authorization: Bearer $WRITER_API_KEY" \
  --data '{
      "model": "palmyra-x5",
      "messages": [
          {
              "role": "user",
              "content": "You are an expert at writing concise product descriptions for an E-Commerce Retailer"
          },
          {
              "role": "assistant",
              "content": "Okay, great I can help write these descriptions. Do you have a specific product in mind?"
          },
          {
              "role": "user",
              "content": "Please write a one sentence product description for a cozy, stylish sweater suitable for both casual and formal occasions"
          }
      ]
  }'
  ```

  ```python Python theme={null}
  from writerai import Writer

  # Initialize the client. If you don't pass the `api_key` parameter,
  # the client looks for the `WRITER_API_KEY` environment variable.
  client = Writer()

  chat_response = client.chat.chat(
    messages=[
      {
          "role": "user",
          "content": "You are an expert at writing concise product descriptions for an E-Commerce Retailer"
      },
      {
          "role": "assistant",
          "content": "Okay, great I can help write these descriptions. Do you have a specific product in mind?"
      },
      {
          "role": "user",
          "content": "Please write a one sentence product description for a cozy, stylish sweater suitable for both casual and formal occasions"
      }
    ],
    model="palmyra-x5"
  )

  print(chat_response.choices[0].message.content)
  ```

  ```javascript JavaScript theme={null}
  import { Writer } from 'writer-sdk';

  // Initialize the Writer client. If you don't pass the `apiKey` parameter,
  // the client looks for the `WRITER_API_KEY` environment variable.
  const client = new Writer();  

  const chatResponse = await client.chat.chat({
    messages: [
      {
          role: "user",
          content: "You are an expert at writing concise product descriptions for an E-Commerce Retailer"
      },
      {
          role: "assistant",
          content: "Okay, great I can help write these descriptions. Do you have a specific product in mind?"
      },
      {
          role: "user",
          content: "Please write a one sentence product description for a cozy, stylish sweater suitable for both casual and formal occasions"
      }
    ],
    model: 'palmyra-x5'
  });

  console.log(chatResponse.choices[0].message.content);
  ```
</CodeGroup>

### Request body

Below are the required and commonly used optional parameters for the text generation endpoint.

| Parameter            | Type    | Description                                                                                                                                                                                                                                                                                                                       |
| -------------------- | ------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `model`              | string  | **Required**. The [ID of the model](/home/models) to use for the chat completion. Can be `palmyra-x4`, `palmyra-x5`, `palmyra-fin`, `palmyra-med`, `palmyra-creative`, or `palmyra-x-003-instruct`.                                                                                                                               |
| `messages`           | array   | **Required**. The conversation history.                                                                                                                                                                                                                                                                                           |
| `messages[].role`    | string  | **Required**. The role of the message sender. Can be `user`, `assistant`, `system`, or `tool`. `system` messages are system prompts, used to provide instructions to the model. `tool` messages are the result of a [tool call](/home/tool-calling#append-the-result-back-to-the-model), and contain the output of the tool call. |
| `messages[].content` | string  | **Required**. The content of the message.                                                                                                                                                                                                                                                                                         |
| `temperature`        | float   | Temperature influences the randomness in generated text. Defaults to `1`. Increase the value for more creative responses, and decrease the value for more predictable responses.                                                                                                                                                  |
| `stream`             | Boolean | A Boolean value that indicates whether to stream the response. Defaults to `false`.                                                                                                                                                                                                                                               |

See the [chat completion endpoint reference](/api-reference/completion-api/chat-completion) for more information on the request body and the additional parameters you can use to control the conversation.

### Response parameters

#### Non-streaming response

If you set the `stream` parameter to `false`, the response is delivered as a single JSON object. It contains several parameters describing the response, including the `choices` array, which contains the generated text.

| Parameter                    | Type   | Description                                                                        |
| ---------------------------- | ------ | ---------------------------------------------------------------------------------- |
| `model`                      | string | The ID of the model used to generate the response.                                 |
| `choices`                    | array  | An array containing one object with the generated text and additional information. |
| `choices[0].message.content` | string | The generated text.                                                                |

See the full list of response parameters in the [chat completion endpoint reference](/api-reference/completion-api/chat-completion).

```json non-streaming response [expandable] theme={null}
{
  "id": "f7aed821-58cf-4210-9d73-538b2cb8ae44",
  "object": "chat.completion",
  "choices": [
    {
      "index": 0,
      "finish_reason": "stop",
      "message": {
        "content": "Elevate your wardrobe with this versatile, cozy sweater that seamlessly transitions from casual days to formal evenings.",
        "role": "assistant",
        "tool_calls": null,
        "graph_data": {
          "sources": null,
          "status": null,
          "subqueries": null
        },
        "llm_data": null,
        "image_data": null,
        "refusal": null
      },
      "logprobs": null
    }
  ],
  "created": 1741891192,
  "model": "palmyra-x5",
  "usage": {
    "prompt_tokens": 96,
    "total_tokens": 118,
    "completion_tokens": 22,
    "prompt_token_details": null,
    "completion_tokens_details": null
  },
  "system_fingerprint": "v1",
  "service_tier": null
}
```

#### Streaming response

If you set the `stream` parameter to `true`, the response is delivered as [server-sent events](https://html.spec.whatwg.org/multipage/server-sent-events.html#server-sent-events). The event contains several parameters. The content of the chunk is in the `choices[0].delta.content` parameter.

| Parameter                  | Type   | Description               |
| -------------------------- | ------ | ------------------------- |
| `choices[0].delta.content` | string | The content of the chunk. |

```python streaming response [expandable] theme={null}
data: {'id': '3bb941e2-4dab-4ceb-b4e3-45a429f40c72',
 'object': 'chat.completion.chunk',
 'choices': [{'index': 0,
   'finish_reason': None,
   'message': {'content': 'This',
    'role': 'assistant',
    'tool_calls': None,
    'graph_data': {'sources': None, 'status': None, 'subqueries': None},
    'llm_data': None,
    'image_data': None,
    'refusal': None},
   'delta': {'content': 'This',
    'role': 'assistant',
    'tool_calls': None,
    'graph_data': {'sources': None, 'status': None, 'subqueries': None},
    'llm_data': None,
    'image_data': None,
    'refusal': None},
   'logprobs': None}],
 'created': 1741891257,
 'model': 'palmyra-x5',
 'usage': None,
 'system_fingerprint': 'v1',
 'service_tier': None}
```

## Sample application

The following sample application uses the Python and JavaScript SDKs to create a command-line chatbot.

The application asks the user for input, passes the conversation history to the LLM, and streams the response from the LLM. It loops until the user enters the message `exit`.

### Set a system prompt

To guide the behavior of the assistant, you can set a system prompt by adding a message with the role `system` to the `messages` array.

For example, you can set a system prompt to have the assistant respond in a certain tone or style, or to provide additional context for the conversation. Here's a system prompt that directs the assistant to be casual and use emojis in its responses:

<CodeGroup>
  ```python Python theme={null}
  system_prompt = "You are a helpful assistant that responds in a casual, friendly tone and uses emojis in your responses."

  messages = [
    {"role": "system", "content": system_prompt},
  ]
  ```

  ```javascript JavaScript theme={null}
  const system_prompt = "You are a helpful assistant that responds in a casual, friendly tone and uses emojis in your responses.";

  let messages = [
    {role: "system", content: system_prompt},
  ];
  ```
</CodeGroup>

Learn about [prompting best practices](/home/prompting) to help you create effective system prompts.

### Stream chat responses

The sample application streams the responses from the LLM. Streaming improves the user experience, showing the input as it is generated and reducing the latency of the final response.

Below are the code snippets from the sample application to stream the chat responses. They use the [`stream` helper method of the `chat` endpoint](/home/streaming#streaming-helpers-for-chat-completions) to print the chat responses to the console in real time.

The `stream` method also collects the final response and returns it so it can be added to the conversation history.

<CodeGroup>
  ```python Python theme={null}
  # Stream the chat response to the user using the `stream` helper method
  with client.chat.stream(messages=messages, model="palmyra-x5") as stream:
      for event in stream:
          # Check if the event is a content delta, which contains this chunk of the chat response
          if event.type == "content.delta":
              print(event.delta, end="", flush=True)
  # Collect the full response from the stream
  completion = stream.get_final_completion().choices[0].message.content
  ```

  ```javascript JavaScript theme={null}
  // Stream the chat response to the user using the `stream` helper method
  const stream = client.chat.stream({
      model: 'palmyra-x5',
      messages: messages,
    })
    // Print this chunk of the chat response to the console
    .on('content', (diff) => process.stdout.write(diff));

  // Collect the full response from the stream
  let full_message = await stream.finalChatCompletion();
  full_message = full_message.choices[0].message.content;
  ```
</CodeGroup>

### Full application

The following is the complete sample application that uses the functions defined above to stream the chat responses.

The application sets a system prompt, asks the user for an initial message, and then enters a loop to handle the conversation between the user and the LLM. It adds the user's message to the conversation history and streams the response from the LLM to the user.

The loop continues until the user enters the message `exit`.

<CodeGroup>
  ```python Python theme={null}
  from writerai import Writer

  # Initialize the client. If you don't pass the `api_key` parameter,
  # the client looks for the `WRITER_API_KEY` environment variable.
  client = Writer()

  end = False
  system_prompt = "You are a helpful assistant that responds in a casual, friendly tone and uses emojis in your responses."

  # Ask the user for an initial message
  input_message = "\nEnter a message for the assistant. Type 'exit' to end the conversation. > "
  initial_message = input(input_message)

  # Add the user's message to the conversation history
  messages = [{"role": "system", "content": system_prompt}, {"role": "user", "content": initial_message}]

  # Main loop to handle the conversation.
  while not end:
      # Stream the chat response to the user using the `stream` helper method
      with client.chat.stream(messages=messages, model="palmyra-x5") as stream:
          for event in stream:
              # Check if the event is a content delta, which contains this chunk of the chat response
              if event.type == "content.delta":
                  print(event.delta, end="", flush=True)
      # Collect the full response from the stream and add it to the conversation history
      full_message = stream.get_final_completion().choices[0].message.content
      messages.append({"role": "assistant", "content": full_message})
      new_message = input(input_message)
      messages.append({"role": "user", "content": new_message})
      if new_message == "exit":
          end = True
  ```

  ```javascript JavaScript theme={null}
  import { Writer } from 'writer-sdk';

  // Initialize the Writer client. If you don't pass the `apiKey` parameter,
  // the client looks for the `WRITER_API_KEY` environment variable.
  const client = new Writer();  

  // helpers for reading user input from the command line
  // ----------------------------------------------------------------------------
  const readline = await import('readline');

  // Create an interface for input and output
  const rl = readline.createInterface({
    input: process.stdin,
    output: process.stdout
  });

  // Prompt the user for input.
  const prompt = (question) => {
    return new Promise((resolve) => {
      rl.question(question, (answer) => {
        resolve(answer);
      });
    });
  };
  // ----------------------------------------------------------------------------

  let end = false;
  let system_prompt = "You are a helpful assistant that responds in a casual, friendly tone and uses emojis in your responses.";
  let initial_message = await prompt("Enter a message for the assistant. Type 'exit' to end the conversation. > ");
  let messages = [{"role": "system", "content": system_prompt}, {"role": "user", "content": initial_message}];

  // Main loop to handle the conversation.
  while (!end) {
    // Stream the chat response to the user using the `stream` helper method
    const stream = client.chat.stream({
      model: 'palmyra-x5',
      messages: messages,
    })
    // Print this chunk of the chat response to the console
    .on('content', (diff) => process.stdout.write(diff));

    // Collect the full response from the stream and add it to the conversation history
    let full_message = await stream.finalChatCompletion();
    full_message = full_message.choices[0].message.content;
    messages.push({role: "assistant", content: full_message});
    console.log("\n");
    let new_message = await prompt("Enter a message for the assistant. Type 'exit' to end the conversation. > ");
    messages.push({role: "user", content: new_message});
    if (new_message === "exit") {
        end = true;
    }
  }
  ```
</CodeGroup>

## Best practices

Follow these best practices to ensure that your chatbot behaves as expected:

* **Use system messages**: Including a system message can guide the behavior of the assistant, setting expectations for its tone and responsiveness.
* **Maintain context**: Ensure that all relevant parts of the conversation are included in the `messages` array to maintain context, as the model doesn't retain memory of past interactions.
* **Handle errors gracefully**: Implement [error handling](/api-reference/error-handling) for various HTTP status codes and API-specific errors such as rate limits or malformed requests.
* **Manage conversational flow**: Regularly review the conversation's context and adjust it to keep interactions relevant and concise, especially under the model's token limit.

## Next steps

Now that you've created a chatbot, learn how to [add tool calling](/home/tool-calling) to your application to enhance the functionality with domain-specific LLMs, Knowledge Graphs, and custom functions.

* [Create custom functions as tools](/home/tool-calling)
* [Access information in Knowledge Graphs](/home/kg-chat)
* [Delegate tasks to domain-specific LLMs](/home/model-delegation)
