Prerequisites
- Python 3.8 or higher installed
- Poetry installed (see their installation guide)
- A Writer API key (follow the Quickstart to obtain an API key)
Getting started
To get started with Instructor, you’ll need to install the library and set up your environment.1
Obtain an API key
First, make sure that you’ve signed up for a Writer AI Studio account and obtained an API key. You can follow the Quickstart to obtain an API key.
2
Install instructor
Once you’ve done so, install
instructor
with Writer support by running:3
Set the `WRITER_API_KEY` environment variable
Make sure to set the
WRITER_API_KEY
environment variable with your Writer API key or pass it as an argument to the Writer constructor.Basic usage
Following is a simple example of how to useinstructor
with Writer:
name
and age
. It then uses the instructor.from_writer
function to create a client
object that uses the Writer API to extract structured data from a text.
Instructor modes
Instructor supports two modes for structured output with Writer:- JSON mode: Returns a response that strictly follows the provided JSON schema.
- Tools mode: Returns a structured response using the
tool_calls
field, where it populates the defined tool and its arguments based on the input.
JSON mode
JSON mode uses Writer’s native JSON schema support. When you use this mode:- Instructor converts your Pydantic model into a JSON schema and passes it as the
response_format
parameter - The model generates a response that strictly follows the provided JSON schema
- Instructor then parses the JSON response and validates it against your original Pydantic model
For more details on structured outputs and JSON schema usage, see our structured output documentation.
Tools mode
Tools mode uses Writer’s tool calling capabilities. When you use this mode:- Instructor converts your Pydantic model into a tool definition
- Instructor adds this tool definition to the API call using the
tools
parameter - The model returns a structured response using the
tool_calls
field, where it populates the defined tool and its arguments based on the input - Instructor extracts the arguments from the
tool_calls
field and validates them against the original Pydantic model
For more details on tool calling, see our tool calling documentation.
Specifying the mode
You can specify which mode to use when initializing the Instructor client:WRITER_TOOLS
by default.
Instructor vs. Writer SDK
When working with structured outputs, you can choose between two main approaches: using Instructor, a high-level framework built on top of LLM providers, or using the Writer SDK directly, which offers low-level access to the Writer LLMs. Each option comes with its own strengths and trade-offs, depending on your goals and level of control needed.Feature | Instructor | Writer SDK |
---|---|---|
Retry mechanism | Built-in retry logic with error handling and prompt editing | Manual implementation required |
Schema handling | Auto-converts a wide range of Python types into JSON Schema via Pydantic | Manual schema conversion required |
Validation & reliability | Enhanced outcomes through Pydantic validation + retries | Same results possible with manual implementation |
Prompt optimization | Built-in prompt minimization and optimization | Manual prompt engineering required |
Model access | Limited to supported models (currently V4 + Tool Calls) | Full access to latest models (V4, V5) and all output modes |
Update speed | Requires updates/contributions for new features | Immediate support for latest Palmyra features |
Building a data repair tool with Instructor and Writer
You can also use Instructor to do advanced data extraction and repair. In this example, you’ll build a Python application that extracts structured data from text, CSV, and PDF files using Instructor and Writer. This application will:- Parse text, CSV, and PDF files
- Extract and validate structured data using Instructor and Writer
- Output the results in CSV format
Setting up the project
1
Create a new project
First, create a new project and set up Poetry for dependency management:
2
Add dependencies
Add the required dependencies to your project:
3
Set up your environment variables
Create a
.env
file in your project root and add your Writer API key:4
Create `main.py` file and add imports
Create a Here’s what each import is used for:
main.py
file and add the following imports:asyncio
: This is used to run the application on multiple files concurrently.csv
: This is used to write the extracted data to a CSV file.json
: This is used to write the extracted data to a JSON file.os
: This is used to read the files.instructor
: Theinstructor
library is used for structured output.writerai
: This is the Writer Python SDK, which is used to interact with the Writer API.typing
andpydantic
: These modules are used to define the types for fields in theUserExtract
class defined in the next step.dotenv
: Thedotenv
module is used to load the.env
file that contains your Writer API key.
5
Setting up Writer client
Initialize the Writer client for both synchronous and asynchronous operations:
Defining the data model
In order for Instructor to extract structured output, you need to define a data model using Pydantic. To define the data model, create aUserExtract
class to represent the data you want to extract:
first_name
and last_name
fields are validated to ensure they start with an uppercase letter and contain only letters. In this example, the email
field is a simple string field, though you could also use a Pydantic field to validate the email format.
Parsing the files
With the data model defined, you can now implement file parsing. This involves creating functions to open the files and extract the text.1
Create a function to handle file processing
Implement the main file handler function that orchestrates the entire process:This function handles the file processing logic, including file type validation, text extraction, data repair, and CSV generation.
2
Create a function to read the files
Next, create a function to read the files based on the given path and extension:
3
Extract the file content
Next, create a function to extract the text from the files. For text files, this function simply reads the file contents. For PDFs, the function uploads the PDF using Writer’s file upload endpoint, parses the text using PDF parsing tool, and then deletes the file from Writer’s servers using the file delete endpoint:
Extracting and repairing the data
With the file content extracted, you can now implement data extraction and repair using Instructor and Writer.1
Create a function to repair the data
Create a function to extract and repair data using Instructor:
2
Implementing CSV generation
Add a function to save the extracted data to CSV:
Creating the main handler
Finally, implement the main function to process multiple files concurrently:Testing the application
Your data repair tool is now ready to use. To test it, follow these steps:1
Create an `example_data` directory
Create an
example_data
directory and add some test files:- A text file with user information
- A PDF file with user information
main.py
file to point to the new files.2
Run the application
Run the application:The application will process both files concurrently and generate CSV files containing the extracted user information.