Skip to main content
This guide shows you how to configure Azure AI Content Safety guardrails with AI Studio. After completing this setup, AI Studio can use Azure Content Safety to moderate harmful content and detect prompt injection attacks across your AI agents. AI Studio supports two Azure Content Safety guardrail types:
  • Text Moderation: Analyzes text for harmful content categories with configurable severity thresholds
  • Prompt Shields: Detects prompt injection attacks in user inputs and malicious content embedded in documents
These are configured as separate guardrails in AI Studio, allowing you to enable one or both depending on your safety requirements. For an overview of how guardrails work in AI Studio, see Configure guardrails.

Prerequisites

Before configuring Azure Content Safety guardrails in AI Studio, you need:
  • An Azure account with an active subscription
  • An Azure AI Content Safety resource deployed in a supported region
  • The API key and endpoint for your Content Safety resource
  • Enterprise plan access to AI Studio

Create a content safety resource and get credentials

If you haven’t created an Azure AI Content Safety resource yet, follow these steps:
  1. Sign in to the Azure portal
  2. Select Create a resource and search for Content Safety
  3. Select Azure AI Content Safety and select Create
  4. Configure your resource:
    • Select your Subscription and Resource group
    • Choose a supported region
    • Enter a unique Name for your resource
    • Select a Pricing tier
  5. Select Review + create, then Create
  6. After deployment completes, navigate to your resource and go to Resource Management > Keys and Endpoint
  7. Copy one of the Keys (either KEY 1 or KEY 2) and the Endpoint URL (for example, https://your-resource-name.cognitiveservices.azure.com/) for use in AI Studio
For more information on creating and managing Azure Content Safety resources, see the Azure Content Safety documentation.

Configure in AI Studio

When adding an Azure Content Safety guardrail in AI Studio, provide these parameters:
ParameterDescription
API keyYour Azure Content Safety API key (from the Azure portal)
API baseYour Content Safety endpoint URL (for example, https://your-resource-name.cognitiveservices.azure.com/)
Text Moderation guardrails also accept the following optional parameters:
ParameterDescriptionDefault
CategoriesThe content categories to evaluate. Select one or more: Hate, SelfHarm, Sexual, Violence.All categories selected
Severity thresholdThe minimum severity level (0–6) that triggers a block. Lower values are more strict. For example, a threshold of 2 blocks content with a severity score of 2 or higher.2
By default, a guardrail applies to all teams in your organization. You can restrict it to specific teams during configuration. See Team scoping for details.

Text Moderation

Azure Content Safety Text Moderation analyzes text for harmful content across four categories:
CategoryDescription
HateDiscriminatory or prejudiced content targeting identity groups
SexualSexually explicit or suggestive content
Self-HarmContent related to self-harm or suicide
ViolenceViolent, threatening, or graphic content

How Text Moderation works with AI Studio

When configured as a guardrail, Text Moderation checks text content against each category and blocks the request if any category’s severity score meets or exceeds the configured threshold. Text Moderation supports the following guardrail modes:
  • Pre-call: Check user input before the LLM processes it. Blocks harmful prompts and saves LLM costs.
  • Post-call: Check LLM output before returning it to the user. Catches harmful content the LLM might generate.
For more information on severity levels and content categories, see Harm categories in the Azure documentation.

Prompt Shields

Azure Content Safety Prompt Shields detects prompt injection attacks—attempts by users or embedded documents to manipulate an AI model into bypassing its safety rules or instructions. Prompt Shields detects two types of attacks:
Attack typeDescription
User Prompt attacksMalicious instructions in user input designed to override the model’s system prompt or safety behaviors
Document attacksHarmful commands embedded in documents or external content that the model processes

How Prompt Shields works with AI Studio

When configured as a guardrail, Prompt Shields analyzes user prompts and any associated documents for injection attacks. If an attack is detected, AI Studio blocks the request. Prompt Shields supports the following guardrail modes:
  • Pre-call: Check user input and documents before the LLM call. Blocks prompt injection attempts and prevents the LLM from processing malicious content.
  • During-call: Check input in parallel with the LLM call for lower latency.
For more information on prompt injection detection, see Prompt Shields in the Azure documentation.

Verify your setup

Before configuring Azure Content Safety in AI Studio, test your resource directly to confirm your API key and endpoint work.

Test Text Moderation

Run this cURL command, replacing <endpoint> and <your_api_key> with your values:
curl --location --request POST '<endpoint>/contentsafety/text:analyze?api-version=2024-09-01' \
--header 'Ocp-Apim-Subscription-Key: <your_api_key>' \
--header 'Content-Type: application/json' \
--data-raw '{
  "text": "This is a test message",
  "categories": ["Hate", "Sexual", "SelfHarm", "Violence"]
}'
A successful response returns severity scores for each category:
{
  "categoriesAnalysis": [
    { "category": "Hate", "severity": 0 },
    { "category": "SelfHarm", "severity": 0 },
    { "category": "Sexual", "severity": 0 },
    { "category": "Violence", "severity": 0 }
  ]
}

Test Prompt Shields

Run this cURL command, replacing <endpoint> and <your_api_key> with your values:
curl --location --request POST '<endpoint>/contentsafety/text:shieldPrompt?api-version=2024-09-01' \
--header 'Ocp-Apim-Subscription-Key: <your_api_key>' \
--header 'Content-Type: application/json' \
--data-raw '{
  "userPrompt": "This is a test message",
  "documents": ["This is a test document"]
}'
A successful response indicates whether an attack was detected:
{
  "userPromptAnalysis": { "attackDetected": false },
  "documentsAnalysis": [{ "attackDetected": false }]
}
If either command returns an error, check that your API key and endpoint are correct and that your Content Safety resource is deployed in a supported region.

Next steps