With the vision endpoint, you can analyze single or multiple images with a prompt. Palmyra Vision allows you to ask questions about an image, generate captions, compare images, and more.

The /vision endpoint will be available in versions 2.1+ of the Writer SDKs.

You need an API key to access the Writer API. Get an API key by following the steps in the API quickstart.

We recommend setting the API key as an environment variable in a .env file with the name WRITER_API_KEY.

Vision endpoint

Endpoint: POST /v1/vision

curl -X POST \
  'https://api.writer.com/v1/vision' \
  -H 'Content-Type: application/json' \
  -H "Authorization: Bearer $WRITER_API_KEY" \
  --data-raw '{
    "model": "palmyra-vision",
    "prompt": "What's the difference between the image {{image_1}} and the image {{image_2}}?",
    "variables": [
      {"name": "image_1", "file_id": "f1234"},
      {"name": "image_2", "file_id": "f5678"}
    ]
  }'

Request body

The request body is a JSON object with the following fields:

ParameterTypeDescription
modelstringThe model to use for the analysis. Must be palmyra-vision.
promptstringThe prompt to use for the analysis. The prompt must include the names of the images you’re analyzing, referencing them as {{name}}. For example: What's the difference between the image {{image_1}} and the image {{image_2}}?
variablesarrayAn array of image variables with a name and file_id.
variables[].namestringThe name of the image. You must use the same name in the prompt, referencing it as {{name}}.
variables[].file_idstringThe File ID of the uploaded image. You must upload the image to Writer before passing it to the Vision endpoint. Learn how to upload images below.

Response format

The response is a JSON object with a data field that contains the analysis results as a string.

{
    "data": "The analysis results"
}

Example: Extract text from an image

This example shows how to extract text from an image using the vision endpoint.

Upload an image

Before you can analyze an image, you need to upload it to Writer.

The following code samples demonstrate how to upload an image and print the File ID. You need the File ID to pass to the Vision endpoint.

curl -X POST 'https://api.writer.com/v1/files' \
  -H 'Content-Type: image/jpeg' \
  -H 'Content-Disposition: attachment; filename=handwriting.jpg' \
  -H "Authorization: Bearer $WRITER_API_KEY" \
  --data-binary "@path/to/file/handwriting.jpg"

Learn more about uploading and managing files.

Generate a caption with the Vision endpoint

Once you have the File IDs for any images you want to analyze, you can pass them to the Vision endpoint along with a prompt.

The prompt must include the names of the images you’re analyzing, referencing them as {{name}}, where name is the name you provided in the variables array. For example: Extract the text from the image {{name}}. If you include files in the variables array that you don’t include in the prompt, the API returns an error.

The following code sample shows the API call to extract text from an image.

curl -X POST \
  'https://api.writer.com/v1/vision' \
  -H 'Content-Type: application/json' \
  -H "Authorization: Bearer $WRITER_API_KEY" \
  --data '{
    "model": "palmyra-vision",
    "prompt": "Extract the text from the image {{handwriting}}.",
    "variables": [{"name": "handwriting", "file_id": "f1234"}]
  }'

The response is a JSON object with a data field that contains the analysis results as a string.

{
    "data": "..."
}

Was this page helpful?