The Writer API supports streaming responses from the API. Streams use server-side events so that you can display content as the API generates it in real time.

Streaming improves user experience by reducing latency from the time the user submits a request to the time the response is available.

Overview

To stream a response from the API, set the stream parameter to true in the request body.

The following endpoints support streaming:

Sample request and response

The code below shows a streaming request and response from a text generation request using curl. The response is a stream of server-sent events.

curl --location 'https://api.writer.com/v1/completions' \
--header 'Content-Type: application/json' \
--header "Authorization: Bearer $WRITER_API_KEY" \
--data '{
  "model": "palmyra-x-004",
  "prompt": "Respond to a customer chat request about a delayed shipment with a message that apologizes for the delay, offers a tracking number, and provides a new estimated delivery date.",
  "stream": true
}'
response
data: {"value":""}

data: {"value":"Hello"}

data: {"value":" ["}

data: {"value":"Customer"}

data: {"value":"'s"}

data: {"value":" Name"}
...

Streaming with SDKs

When you stream a response using a Writer SDK, the SDK creates an iterator that yields chunks of the response. You can iterate over the stream to receive the response.

See below for examples of streaming with the Python and JavaScript SDKs for each endpoint that supports streaming.

from writerai import Writer

# Initialize the client. If you don't pass the `api_key` parameter,
# the client looks for the `WRITER_API_KEY` environment variable.
client = Writer()

text_generation = client.completions.create(
  model="palmyra-x-004",
  prompt="Respond to a customer chat request about a delayed shipment with a message that apologizes for the delay, offers a tracking number, and provides a new estimated delivery date.",
  stream=True
)

for chunk in text_generation:
    print(chunk.value, end="", flush=True)

Streaming helpers for chat completions

The Python and Node SDKs include streaming helpers for chat completions. These helpers provide more granular details about the streaming events and accumulate the response.

To use the streaming helpers, call client.chat.stream. Then, include all the same parameters as you would for a non-streaming chat completion request, except omit the stream parameter.

from writerai import Writer

# Initialize the client. If you don't pass the `api_key` parameter,
# the client looks for the `WRITER_API_KEY` environment variable.
client = Writer()

with client.chat.stream(
  model="palmyra-x-004",
  messages=[{"role": "user", "content": "Respond to a customer chat request about a delayed shipment with a message that apologizes for the delay, offers a tracking number, and provides a new estimated delivery date."}]
) as stream:
    for event in stream:
        if event.type == "content.delta":
            print(event.value, end="", flush=True)

# print the final response
completion = stream.get_final_completion()
print(completion.choices[0].message.content)

For more information about the streaming helpers for chat completions, see the Python and Node SDKs.