Analyze images in a chat
The Vision tool for chat completions allows you to analyze images during a chat completion. You can perform actions such as extracting text, interpreting charts and graphs, performing image-based compliance checks, and more.
The Writer API also has a vision
endpoint that you can use to analyze images outside of a chat completion. See the vision API guide for more information.
This guide explains how to use the Vision tool in a chat completion and provides an example of how to use it.
You need an API key to access the Writer API. Get an API key by following the steps in the API quickstart.
We recommend setting the API key as an environment variable in a .env
file with the name WRITER_API_KEY
.
Tool structure
The Vision tool allows you to analyze an image during a chat with an LLM.
To use the Vision tool, add it to the tools
array in your chat-completion
endpoint request.
The Vision tool object has the following structure:
Parameter | Type | Description |
---|---|---|
type | string | The type of tool, which is llm for LLM tool |
function | object | An object containing the tool’s description and model |
function.model | string | palmyra-vision |
function.variables | array | An array of objects, one for each image to pass to Palmyra Vision |
function.variables.name | string | The name of the image to pass to Palmyra Vision. You must use this name when referencing the image in the message you provide to the chat completion endpoint. Reference the image as {{name}} , where name is the name you provided in the variables array. |
function.variables.file_id | string | The ID of the uploaded image. You must upload the image to Writer before using it with the Vision tool. Learn more in the upload file guide. |
The message you provide to the chat completion endpoint must reference each image you include in the function.variables
array, by name. For example, if you include an image named new_product
in the function.variables
array, you must reference it in the message as {{new_product}}
, with double curly braces around the name. Your message to the chat completion endpoint might look like this: “Provide a two-sentence summary of the product within the image {{new_product}}
.”
You can only pass one prebuilt tool in the tools
array at a time. However, you can pass multiple custom tools in the same request.
Prebuilt tools are:
- Vision tool
- Knowledge Graph tool
- LLM tool
Response format
For non-streaming responses, the response from the Vision tool is in the choices[0].message.content
field. For streaming responses, the response is in the choices[0].delta.content
field.
See the chat completion endpoint for more information on the response fields.
Usage example
This example uses palmyra-vision
to interpret a graph during a chat completion.
Upload an image to Writer
Before you can use the Vision tool, you must upload the image to Writer.
The following code samples demonstrate how to upload an image and print the File ID. You need the File ID to pass to the Vision endpoint.
Learn more about uploading and managing files.
Create a tools array containing an LLM tool
To use the LLM tool, create a tools
array that specifies the Writer model you want to use.
Send the request using chat completions
Add the tools array to the chat endpoint call along with your array of messages. Setting tool_choice
to auto
allows the model to choose when to use the Vision tool, based on the message provided in the messages
array.
This example streams the response in real time, rather than waiting for the entire response to be generated.
If you are unfamiliar with the chat completions endpoint or streaming vs. non-streaming responses, learn more in the chat completion guide.
By following this guide, you can use the Vision tool to have the palmyra-vision
model interpret an image during a chat completion.
Next steps
Learn about additional capabilities of the Writer API, such as analyzing unstructured medical documents and context-aware text splitting.
Was this page helpful?