Set up Azure Content Safety guardrails

This guide shows you how to configure Azure AI Content Safety guardrails with AI Studio. After completing this setup, AI Studio can use Azure Content Safety to moderate harmful content and detect prompt injection attacks across your AI agents. AI Studio supports two Azure Content Safety guardrail types:

Text Moderation: Analyzes text for harmful content categories with configurable severity thresholds
Prompt Shields: Detects prompt injection attacks in user inputs and malicious content embedded in documents

These are configured as separate guardrails in AI Studio, allowing you to enable one or both depending on your safety requirements. For an overview of how guardrails work in AI Studio, see Configure guardrails.

Prerequisites

Before configuring Azure Content Safety guardrails in AI Studio, you need:

An Azure account with an active subscription
An Azure AI Content Safety resource deployed in a supported region
The API key and endpoint for your Content Safety resource
Enterprise plan access to AI Studio

Create a content safety resource and get credentials

If you haven’t created an Azure AI Content Safety resource yet, follow these steps:

Sign in to the Azure portal
Select Create a resource and search for Content Safety
Select Azure AI Content Safety and select Create
Configure your resource:
- Select your Subscription and Resource group
- Choose a supported region
- Enter a unique Name for your resource
- Select a Pricing tier
Select Review + create, then Create
After deployment completes, navigate to your resource and go to Resource Management > Keys and Endpoint
Copy one of the Keys (either KEY 1 or KEY 2) and the Endpoint URL (for example, https://your-resource-name.cognitiveservices.azure.com/) for use in AI Studio

For more information on creating and managing Azure Content Safety resources, see the Azure Content Safety documentation.

Configure in AI Studio

When adding an Azure Content Safety guardrail in AI Studio, provide these parameters:

Parameter	Description
API key	Your Azure Content Safety API key (from the Azure portal)
API base	Your Content Safety endpoint URL (for example, `https://your-resource-name.cognitiveservices.azure.com/`)

Text Moderation guardrails also accept the following optional parameters:

Parameter	Description	Default
Categories	The content categories to evaluate. Select one or more: Hate, SelfHarm, Sexual, Violence.	All categories selected
Severity threshold	The minimum severity level (0–6) that triggers a block. Lower values are more strict. For example, a threshold of `2` blocks content with a severity score of 2 or higher.	`2`

By default, a guardrail applies to all teams in your organization. You can restrict it to specific teams during configuration. See Team scoping for details.

Text Moderation

Azure Content Safety Text Moderation analyzes text for harmful content across four categories:

Category	Description
Hate	Discriminatory or prejudiced content targeting identity groups
Sexual	Sexually explicit or suggestive content
Self-Harm	Content related to self-harm or suicide
Violence	Violent, threatening, or graphic content

How Text Moderation works with AI Studio

When configured as a guardrail, Text Moderation checks text content against each category and blocks the request if any category’s severity score meets or exceeds the configured threshold. Text Moderation supports the following guardrail modes:

Pre-call: Check user input before the LLM processes it. Blocks harmful prompts and saves LLM costs.
Post-call: Check LLM output before returning it to the user. Catches harmful content the LLM might generate.

For more information on severity levels and content categories, see Harm categories in the Azure documentation.

Prompt Shields

Azure Content Safety Prompt Shields detects prompt injection attacks—attempts by users or embedded documents to manipulate an AI model into bypassing its safety rules or instructions. Prompt Shields detects two types of attacks:

Attack type	Description
User Prompt attacks	Malicious instructions in user input designed to override the model’s system prompt or safety behaviors
Document attacks	Harmful commands embedded in documents or external content that the model processes

How Prompt Shields works with AI Studio

When configured as a guardrail, Prompt Shields analyzes user prompts and any associated documents for injection attacks. If an attack is detected, AI Studio blocks the request. Prompt Shields supports the following guardrail modes:

Pre-call: Check user input and documents before the LLM call. Blocks prompt injection attempts and prevents the LLM from processing malicious content.
During-call: Check input in parallel with the LLM call for lower latency.

For more information on prompt injection detection, see Prompt Shields in the Azure documentation.

Verify your setup

Before configuring Azure Content Safety in AI Studio, test your resource directly to confirm your API key and endpoint work.

Test Text Moderation

Run this cURL command, replacing <endpoint> and <your_api_key> with your values:

curl --location --request POST '<endpoint>/contentsafety/text:analyze?api-version=2024-09-01' \
--header 'Ocp-Apim-Subscription-Key: <your_api_key>' \
--header 'Content-Type: application/json' \
--data-raw '{
  "text": "This is a test message",
  "categories": ["Hate", "Sexual", "SelfHarm", "Violence"]
}'

A successful response returns severity scores for each category:

{
  "categoriesAnalysis": [
    { "category": "Hate", "severity": 0 },
    { "category": "SelfHarm", "severity": 0 },
    { "category": "Sexual", "severity": 0 },
    { "category": "Violence", "severity": 0 }
  ]
}

Test Prompt Shields

Run this cURL command, replacing <endpoint> and <your_api_key> with your values:

curl --location --request POST '<endpoint>/contentsafety/text:shieldPrompt?api-version=2024-09-01' \
--header 'Ocp-Apim-Subscription-Key: <your_api_key>' \
--header 'Content-Type: application/json' \
--data-raw '{
  "userPrompt": "This is a test message",
  "documents": ["This is a test document"]
}'

A successful response indicates whether an attack was detected:

{
  "userPromptAnalysis": { "attackDetected": false },
  "documentsAnalysis": [{ "attackDetected": false }]
}

If either command returns an error, check that your API key and endpoint are correct and that your Content Safety resource is deployed in a supported region.

Next steps

Configure guardrails: Learn about guardrail modes, team scoping, and error handling
Track usage and spend: Monitor guardrail activity and usage
Azure AI Content Safety documentation: Detailed Azure documentation
Azure Content Safety pricing: Understand Azure Content Safety costs

​Prerequisites

​Create a content safety resource and get credentials

​Configure in AI Studio

​Text Moderation

​How Text Moderation works with AI Studio

​Prompt Shields

​How Prompt Shields works with AI Studio

​Verify your setup

​Test Text Moderation

​Test Prompt Shields

​Next steps

Prerequisites

Create a content safety resource and get credentials

Configure in AI Studio

Text Moderation

How Text Moderation works with AI Studio

Prompt Shields

How Prompt Shields works with AI Studio

Verify your setup

Test Text Moderation

Test Prompt Shields

Next steps