The Vision tool for chat completions allows you to analyze images during a chat completion. You can perform actions such as extracting text, interpreting charts and graphs, performing image-based compliance checks, and more.

The Writer API also has a vision endpoint that you can use to analyze images outside of a chat completion. See the vision API guide for more information.

This guide explains how to use the Vision tool in a chat completion and provides an example of how to use it.

You need an API key to access the Writer API. Get an API key by following the steps in the API quickstart.

We recommend setting the API key as an environment variable in a .env file with the name WRITER_API_KEY.

Tool structure

The Vision tool allows you to analyze an image during a chat with an LLM.

To use the Vision tool, add it to the tools array in your chat-completion endpoint request.

The Vision tool object has the following structure:

ParameterTypeDescription
typestringThe type of tool, which is llm for LLM tool
functionobjectAn object containing the tool’s description and model
function.modelstringpalmyra-vision
function.variablesarrayAn array of objects, one for each image to pass to Palmyra Vision
function.variables.namestringThe name of the image to pass to Palmyra Vision. You must use this name when referencing the image in the message you provide to the chat completion endpoint. Reference the image as {{name}}, where name is the name you provided in the variables array.
function.variables.file_idstringThe ID of the uploaded image. You must upload the image to Writer before using it with the Vision tool. Learn more in the upload file guide.

The message you provide to the chat completion endpoint must reference each image you include in the function.variables array, by name. For example, if you include an image named new_product in the function.variables array, you must reference it in the message as {{new_product}}, with double curly braces around the name. Your message to the chat completion endpoint might look like this: “Provide a two-sentence summary of the product within the image {{new_product}}.”

"tools": [
    {
        "type": "vision",
        "function": {
            "model": "palmyra-vision",
            "variables": [
                {
                    "name": "new_product",
                    "file_id": "1234567890"
                }
            ]
        }
    }  
]

You can only pass one prebuilt tool in the tools array at a time. However, you can pass multiple custom tools in the same request.

Prebuilt tools are:

Response format

For non-streaming responses, the response from the Vision tool is in the choices[0].message.content field. For streaming responses, the response is in the choices[0].delta.content field.

See the chat completion endpoint for more information on the response fields.

{
  "id": "1234",
  "object": "chat.completion",
  "choices": [
    {
      "index": 0,
      "finish_reason": "tool_calls",
      "message": {
        "content": "The image shows...",
        "role": "assistant",
        "tool_calls": null,
        "graph_data": {
          "sources": null,
          "status": null,
          "subqueries": null
        },
        "llm_data": null,
        "image_data": null,
        "refusal": null
      },
      "logprobs": null
    }
  ],
  "created": 1743740333,
  "model": "palmyra-x-004",
  "usage": {
    "prompt_tokens": 223,
    "total_tokens": 254,
    "completion_tokens": 31,
    "prompt_token_details": null,
    "completion_tokens_details": null
  },
  "system_fingerprint": "v1",
  "service_tier": null
}

Usage example

This example uses palmyra-vision to interpret a graph during a chat completion.

Upload an image to Writer

Before you can use the Vision tool, you must upload the image to Writer.

The following code samples demonstrate how to upload an image and print the File ID. You need the File ID to pass to the Vision endpoint.

curl -X POST 'https://api.writer.com/v1/files' \
  -H 'Content-Type: image/jpeg' \
  -H 'Content-Disposition: attachment; filename=graph.jpg' \
  -H "Authorization: Bearer $WRITER_API_KEY" \
  --data-binary "@path/to/file/graph.jpg"

Learn more about uploading and managing files.

Create a tools array containing an LLM tool

To use the LLM tool, create a tools array that specifies the Writer model you want to use.

"tools": [
    {
        "type": "vision",
        "function": {
            "model": "palmyra-vision",
            "variables": [
                {
                    "name": "graph",
                    "file_id": "1234567890"
                }
            ]
        }
    }
]

Send the request using chat completions

Add the tools array to the chat endpoint call along with your array of messages. Setting tool_choice to auto allows the model to choose when to use the Vision tool, based on the message provided in the messages array.

This example streams the response in real time, rather than waiting for the entire response to be generated.

If you are unfamiliar with the chat completions endpoint or streaming vs. non-streaming responses, learn more in the chat completion guide.

curl --location 'https://api.writer.com/v1/chat' \
    --header 'Content-Type: application/json' \
    --header "Authorization: Bearer $WRITER_API_KEY" \
    --data '{
        "model": "palmyra-x-004",
        "temperature": 0.7,
        "messages": [
            {
                "role": "user",
                "content": "Summarize the main trends and findings in the graph {{graph}}."
            }
        ],
        "tool_choice": "auto",
        "tools": [
            {
                "type": "vision",
                "function": {
                    "model": "palmyra-vision",
                    "variables": [
                        {
                            "name": "graph",
                            "file_id": "1234567890"
                        }
                    ]
                }
            }
        ],
        "stream": true
    }'

By following this guide, you can use the Vision tool to have the palmyra-vision model interpret an image during a chat completion.

Next steps

Learn about additional capabilities of the Writer API, such as analyzing unstructured medical documents and context-aware text splitting.