Documentation Index
Fetch the complete documentation index at: https://dev.writer.com/llms.txt
Use this file to discover all available pages before exploring further.
This guide shows you how to configure Azure AI Content Safety guardrails with AI Studio. After completing this setup, AI Studio can use Azure Content Safety to moderate harmful content and detect prompt injection attacks across your AI agents.
AI Studio supports two Azure Content Safety guardrail types:
- Text Moderation: Analyzes text for harmful content categories with configurable severity thresholds
- Prompt Shields: Detects prompt injection attacks in user inputs and malicious content embedded in documents
These are configured as separate guardrails in AI Studio, allowing you to enable one or both depending on your safety requirements.
For an overview of how guardrails work in AI Studio, see Configure guardrails.
Prerequisites
Before configuring Azure Content Safety guardrails in AI Studio, you need:
- An Azure account with an active subscription
- An Azure AI Content Safety resource deployed in a supported region
- The API key and endpoint for your Content Safety resource
- Enterprise plan access to AI Studio
Create a content safety resource and get credentials
If you haven’t created an Azure AI Content Safety resource yet, follow these steps:
- Sign in to the Azure portal
- Select Create a resource and search for Content Safety
- Select Azure AI Content Safety and select Create
- Configure your resource:
- Select your Subscription and Resource group
- Choose a supported region
- Enter a unique Name for your resource
- Select a Pricing tier
- Select Review + create, then Create
- After deployment completes, navigate to your resource and go to Resource Management > Keys and Endpoint
- Copy one of the Keys (either KEY 1 or KEY 2) and the Endpoint URL (for example,
https://your-resource-name.cognitiveservices.azure.com/) for use in AI Studio
For more information on creating and managing Azure Content Safety resources, see the Azure Content Safety documentation.
When adding an Azure Content Safety guardrail in AI Studio, provide these parameters:
| Parameter | Description |
|---|
| API key | Your Azure Content Safety API key (from the Azure portal) |
| API base | Your Content Safety endpoint URL (for example, https://your-resource-name.cognitiveservices.azure.com/) |
Text Moderation guardrails also accept the following optional parameters:
| Parameter | Description | Default |
|---|
| Categories | The content categories to evaluate. Select one or more: Hate, SelfHarm, Sexual, Violence. | All categories selected |
| Severity threshold | The minimum severity level (0–6) that triggers a block. Lower values are more strict. For example, a threshold of 2 blocks content with a severity score of 2 or higher. | 2 |
By default, a guardrail applies to all teams in your organization. You can restrict it to specific teams during configuration. See Team scoping for details.
Text Moderation
Azure Content Safety Text Moderation analyzes text for harmful content across four categories:
| Category | Description |
|---|
| Hate | Discriminatory or prejudiced content targeting identity groups |
| Sexual | Sexually explicit or suggestive content |
| Self-Harm | Content related to self-harm or suicide |
| Violence | Violent, threatening, or graphic content |
How Text Moderation works with AI Studio
When configured as a guardrail, Text Moderation checks text content against each category and blocks the request if any category’s severity score meets or exceeds the configured threshold.
Text Moderation supports the following guardrail modes:
- Pre-call: Check user input before the LLM processes it. Blocks harmful prompts and saves LLM costs.
- Post-call: Check LLM output before returning it to the user. Catches harmful content the LLM might generate.
For more information on severity levels and content categories, see Harm categories in the Azure documentation.
Prompt Shields
Azure Content Safety Prompt Shields detects prompt injection attacks—attempts by users or embedded documents to manipulate an AI model into bypassing its safety rules or instructions.
Prompt Shields detects two types of attacks:
| Attack type | Description |
|---|
| User Prompt attacks | Malicious instructions in user input designed to override the model’s system prompt or safety behaviors |
| Document attacks | Harmful commands embedded in documents or external content that the model processes |
How Prompt Shields works with AI Studio
When configured as a guardrail, Prompt Shields analyzes user prompts and any associated documents for injection attacks. If an attack is detected, AI Studio blocks the request.
Prompt Shields supports the following guardrail modes:
- Pre-call: Check user input and documents before the LLM call. Blocks prompt injection attempts and prevents the LLM from processing malicious content.
- During-call: Check input in parallel with the LLM call for lower latency.
For more information on prompt injection detection, see Prompt Shields in the Azure documentation.
Verify your setup
Before configuring Azure Content Safety in AI Studio, test your resource directly to confirm your API key and endpoint work.
Test Text Moderation
Run this cURL command, replacing <endpoint> and <your_api_key> with your values:
curl --location --request POST '<endpoint>/contentsafety/text:analyze?api-version=2024-09-01' \
--header 'Ocp-Apim-Subscription-Key: <your_api_key>' \
--header 'Content-Type: application/json' \
--data-raw '{
"text": "This is a test message",
"categories": ["Hate", "Sexual", "SelfHarm", "Violence"]
}'
A successful response returns severity scores for each category:
{
"categoriesAnalysis": [
{ "category": "Hate", "severity": 0 },
{ "category": "SelfHarm", "severity": 0 },
{ "category": "Sexual", "severity": 0 },
{ "category": "Violence", "severity": 0 }
]
}
Test Prompt Shields
Run this cURL command, replacing <endpoint> and <your_api_key> with your values:
curl --location --request POST '<endpoint>/contentsafety/text:shieldPrompt?api-version=2024-09-01' \
--header 'Ocp-Apim-Subscription-Key: <your_api_key>' \
--header 'Content-Type: application/json' \
--data-raw '{
"userPrompt": "This is a test message",
"documents": ["This is a test document"]
}'
A successful response indicates whether an attack was detected:
{
"userPromptAnalysis": { "attackDetected": false },
"documentsAnalysis": [{ "attackDetected": false }]
}
If either command returns an error, check that your API key and endpoint are correct and that your Content Safety resource is deployed in a supported region.
Next steps