AI Security for Apps AI Gateway WAF

Firewall for AI & LLMs

Complete reference: WAF-integrated LLM threat detection for prompt injection, PII exposure, and unsafe topics — plus AI Gateway security features including Guardrails, DLP, rate limiting, and caching.

🛈 Naming note: "Firewall for AI" was the original product name. It is now officially called AI Security for Apps in Cloudflare documentation, but the underlying capabilities are identical.

What It Is

AI Security for Apps extends the Cloudflare WAF with detections specifically designed for LLM-powered applications. It is model-agnostic — it works regardless of which LLM you use (OpenAI, Anthropic, Google Gemini, Workers AI, self-hosted, etc.).

It is complemented by AI Gateway, a proxy layer (available on all plans) that adds Guardrails, DLP, rate limiting, caching, and full prompt/response logging between your app and the LLM provider.

🔐

Prompt Injection

Detect attackers trying to hijack the LLM's behavior by overriding its system instructions or extracting the system prompt.

👤

PII Detection

Catch users inadvertently or maliciously sending sensitive personal data (SSNs, credit cards, emails) in prompts.

🚫

Unsafe Topics

Block prompts covering harmful subjects — violent crimes, hate speech, self-harm, CSAM, weapons — across 14 categories.

🎯

Custom Topics

Define up to 20 organization-specific topics — competitors, legal advice, internal HR matters — using zero-shot classification.

Plan Availability

ℹ

Requires WAF enabled on the zone. AI detection fields require Enterprise plan + paid add-on. Contact your account team to enable. AI Gateway is available on all plans at no cost.

Capability	Free	Pro	Business	Enterprise
LLM endpoint discovery (`cf-llm` auto-label)	Yes	Yes	Yes	Yes
AI Security Log Mode Ruleset (full prompt logging)	No	No	No	Paid add-on
AI detection fields — PII, injection score, unsafe topics, custom topics	No	No	No	Paid add-on
AI Gateway (rate limiting, caching, logging, Guardrails, DLP)	Yes	Yes	Yes	Yes

Architecture & Traffic Flow

Cloudflare's AI security is a layered stack. Requests pass through WAF-level LLM detection first, then optionally through AI Gateway before reaching the LLM provider.

User / Client Browser or API Consumer | v ┌─────────────────────────────────────────────────────────────┐ │ Cloudflare Edge (anycast — nearest data center) │ │ │ │ 1. LLM Discovery ─→ heuristics ─→ cf-llm label │ │ 2. AI Detection Engine (Enterprise add-on) │ │ ├── PII Detection (fuzzy AI + regex) │ │ ├── Prompt Injection (score 1–99) │ │ └── Unsafe / Custom Topics │ │ 3. WAF Rule Evaluation (cf.llm.* fields populated) │ │ ├── Log / Monitor ──→ Security Analytics │ │ └── Mitigate ──→ Block / Challenge / Rate Limit │ └─────────────────────────────────────────────────────────────┘ | v ┌─────────────────────────────────────────────────────────────┐ │ AI Gateway (all plans) │ │ ├── Authenticated Gateway (cf-aig-authorization header) │ │ ├── Rate Limiting (fixed / sliding window) │ │ ├── DLP (Beta) (prompt + response scanning) │ │ ├── Guardrails (Beta) (content moderation) │ │ ├── Caching (identical prompt→response) │ │ └── Dynamic Routing (model fallback / retry) │ └─────────────────────────────────────────────────────────────┘ | v LLM Provider (OpenAI / Anthropic / Workers AI / Gemini …)

ℹ

Steps 1–3 (WAF layer) require the Enterprise paid add-on. AI Gateway is available on all plans and can be used independently without the WAF add-on.

LLM Endpoint Discovery

Cloudflare automatically detects LLM endpoints using traffic heuristics — no manual configuration required. Once detected, endpoints are labeled cf-llm via API Shield, enabling filtering in Security Analytics and scoping of WAF rules.

How Heuristics Work

Signal	Detail
Response time	LLM endpoints typically respond in >1 second
Effective bitrate	80% of LLM endpoints operate at <4 KB/s (streaming tokens)
False positive filtering	GraphQL endpoints, device heartbeats, QR/OTP generators are filtered out automatically

✅

You can also manually apply the cf-llm label to specific endpoints via Security > Web Assets > Endpoints > Edit endpoint labels, or bulk apply via API Shield's endpoint management API.

⚠

AI Security for Apps currently only scans requests with

Content-Type:
                            application/json

. Non-JSON LLM requests are not scanned.

PII Detection

Two complementary approaches that can be combined for layered protection:

Fuzzy Detection (AI-powered)

Uses Microsoft Presidio to detect PII even in natural language or unexpected formats. Supports 40+ categories:

CREDIT_CARD US_SSN US_PASSPORT US_DRIVER_LICENSE EMAIL_ADDRESS PHONE_NUMBER IP_ADDRESS IBAN_CODE PERSON LOCATION DATE_TIME URL IN_AADHAAR UK_NHS AU_TFN SG_NRIC_FIN + 24 more

🚨

Never block on cf.llm.prompt.pii_detected alone. Broad categories like PERSON, DATE_TIME, and LOCATION appear in normal conversation and will generate large numbers of false positives. Always filter by specific categories using cf.llm.prompt.pii_categories.

Exact Detection (Regex)

Use WAF custom rules with http.request.body.raw matches "PATTERN" for organization-specific formats:

Custom PII type	Example format	Regex pattern
Employee ID	`EMP-482910`	`EMP-[0-9]{6}`
Patient record number	`PAT/2024/00391`	`PAT/[0-9]{4}/[0-9]{5}`
Internal account ID	`ACCT-XX-99999`	`ACCT-[A-Z]{2}-[0-9]{5}`
Custom API key prefix	`sk_live_abc123...`	`sk_live_[a-zA-Z0-9]{20,}`

Prompt Injection Detection

Score-based system using the cf.llm.prompt.injection_score field. Range: 1–99. Lower score = higher injection risk.

1 (Most Dangerous)99 (Safest)

BlockChallengeAllow

Score range	Risk level	Recommended action
1 – 19	High — strongly resembles known injection patterns	Block
20 – 49	Moderate — some injection characteristics, may be ambiguous	Challenge or Log
50 – 99	Low — likely safe, normal user input	Allow

A score-based approach is used rather than a binary result because injection exists on a spectrum. A creative writing request may superficially resemble an injection attempt without actually being one.

💡

Start with a Log action at threshold lt 40. Review results in Security Analytics, then tune down to lt 30 or lt 20 based on actual false positive rates before switching to Block.

Unsafe & Custom Topic Detection

Predefined Unsafe Topics (14 categories)

Category	Description
`S1`	Violent crimes
`S2`	Non-violent crimes
`S3`	Sex-related crimes
`S4`	Child sexual exploitation
`S5`	Defamation
`S6`	Specialized advice
`S7`	Privacy
`S8`	Intellectual property
`S9`	Indiscriminate weapons
`S10`	Hate
`S11`	Suicide and self-harm
`S12`	Sexual content
`S13`	Elections
`S14`	Code interpreter abuse

Custom Topic Detection

Define up to 20 custom topics using zero-shot classification — no model training required. Each topic has a label (used in rules) and a topic string (used by the AI classifier). Scores follow the 1–99 scale (lower = more relevant to the topic).

Constraints

Parameter	Limit
Maximum number of topics	20
Topic string length	2–50 printable ASCII characters
Label length	2–20 characters
Label format	Lowercase letters, numbers, and hyphens only

Define Custom Topics via API

curl "https://api.cloudflare.com/client/v4/zones/$ZONE_ID/firewall-for-ai/custom_topics" \
  --request PUT \
  --header "Authorization: Bearer $CLOUDFLARE_API_TOKEN" \
  --json '{
    "topics": [
      { "label": "competitors", "topic": "asking about competitor products and pricing" },
      { "label": "legal-advice", "topic": "asking for legal counsel or regulatory guidance" },
      { "label": "hr-internal", "topic": "internal HR policies and employee matters" }
    ]
  }'

⚠

This PUT request replaces your entire topic list. Always include all topics you want to keep, not just new ones.

Topic String Best Practices

Style	Example	Verdict
Verb phrase (recommended)	asking for investment advice	Best precision
Sentence-like	a user seeking financial guidance	Good
Noun phrase	investment advice	Acceptable
Single keyword	finance	Too broad
Vague phrase	bad things	Ineffective

Example Custom Topics

Label	Topic string	Use case
`competitors`	asking about Acme Corp products and pricing	Block chatbot discussing rival offerings
`legal-advice`	asking for legal counsel or regulatory compliance guidance	Block prompts soliciting legal advice
`student-data`	requesting student personal information or academic records	EdTech — prevent student data exposure
`crypto-advice`	asking for cryptocurrency trading or investment recommendations	FinTech — block crypto investment tips
`exec-internal`	discussing internal executive decisions or leadership changes	Prevent internal matter leakage

Detection Fields Reference

These fields are populated by AI Security for Apps on requests hitting cf-llm labeled endpoints and can be used in WAF Custom Rules and Rate Limiting Rules.

Field	Type	Description
`cf.llm.prompt.detected`	Boolean	LLM prompt was detected in the request
`cf.llm.prompt.pii_detected`	Boolean	Any PII found in the prompt — do not block on this alone
`cf.llm.prompt.pii_categories`	Array<String>	PII types found (CREDIT_CARD, US_SSN, EMAIL_ADDRESS, etc.)
`cf.llm.prompt.injection_score`	Number (1–99)	Injection likelihood — lower = more dangerous
`cf.llm.prompt.unsafe_topic_detected`	Boolean	Any predefined unsafe topic detected in prompt
`cf.llm.prompt.unsafe_topic_categories`	Array<String>	Which unsafe categories detected (S1–S14)
`cf.llm.prompt.custom_topic_categories`	Map<Number>	Custom topic relevance scores by label (1–99, lower = more relevant)
`cf.llm.prompt.token_count`	Number	Estimated token count of the prompt — useful for cost-based rate limiting

Log Mode vs Production Mode

Feature	Log Mode	Production Mode
How it works	Pre-built managed ruleset	Custom WAF rules using detection fields
Prompt logging	Yes (encrypted payload logging)	No — metadata only
Response logging	No	No — use AI Gateway
Policy flexibility	Limited — 3 fixed rules	Full — scores, categories, combined signals
Blocking behavior	Default WAF block page	Fully customizable responses
Best for	Evaluation and threshold tuning	Production enforcement

Enable Log Mode via API

curl "https://api.cloudflare.com/client/v4/zones/$ZONE_ID/rulesets/phases/http_request_firewall_managed/entrypoint" \
  --request PUT \
  --header "Authorization: Bearer $CLOUDFLARE_API_TOKEN" \
  --json '{
    "rules": [{
      "action": "execute",
      "action_parameters": { "id": "b7cd52df92f74c848cec0c2ed385e336" },
      "expression": "true"
    }]
  }'

Enable via Dashboard

Security > Settings > AI Security for Apps > Managed Ruleset > Enable
Action: Log
Configure payload logging to allow decryption of prompts in Security Analytics

Setup Steps — AI Security for Apps (WAF)

Enable AI Security for Apps Dashboard: Security > Settings > filter by "Detection tools" > Toggle AI Security for Apps On.

Or via API:

curl "https://api.cloudflare.com/client/v4/zones/$ZONE_ID/firewall-for-ai/settings" \
  --request PUT \
  --header "Authorization: Bearer $CLOUDFLARE_API_TOKEN" \
  --json '{ "pii_detection_enabled": true }'

Label LLM endpoints with cf-llm Auto-discovery handles this in most cases. To manually apply: Security > Web Assets > Endpoints > Edit endpoint labels > cf-llm.
Enable the AI Security Log Mode Ruleset Turn on with action = Log. Enable payload logging so you can decrypt and read actual prompts in Security Analytics.
Review detections in Security Analytics Filter by the cf-llm label. Decrypt payloads. Note injection scores, PII categories, and unsafe topic rates across your traffic baseline.
Define custom topics (if needed) Use the dashboard or API to define up to 20 custom topics. Use verb-phrase topic strings for best precision.
Build production custom rules Create WAF Custom Rules using cf.llm.* fields with tuned thresholds. Start with Log action to validate before blocking.
Switch to Block and iterate Once rules are validated, change actions to Block. Monitor continuously and adjust thresholds as traffic patterns evolve.

Example WAF Rules

Block High-Confidence Prompt Injection

(cf.llm.prompt.injection_score lt 20)

Challenge Moderate-Risk Injection

(cf.llm.prompt.injection_score lt 40)

Block Specific PII Categories Only

(any(cf.llm.prompt.pii_categories[*] in {"CREDIT_CARD" "US_SSN"}))

Log Emails — Block Credit Cards and SSNs

Rule 1 — Block:

(any(cf.llm.prompt.pii_categories[*] in {"CREDIT_CARD" "US_SSN"}))

Rule 2 — Log:

(any(cf.llm.prompt.pii_categories[*] in {"EMAIL_ADDRESS"}))

Block Specific Unsafe Topics

(any(cf.llm.prompt.unsafe_topic_categories[*] in {"S1" "S10"}))

Layered — Injection + Bot Score + Geo (Low False Positives)

(cf.llm.prompt.injection_score lt 25
  and cf.bot_management.score lt 10
  and ip.geoip.country ne "US")

Block Injection from Automated Sources

(cf.llm.prompt.injection_score lt 30 and cf.bot_management.score lt 20)

Combined — Injection + PII (Common Attack Pattern)

(cf.llm.prompt.injection_score lt 40 and cf.llm.prompt.pii_detected)

Block Custom Topic (Competitors)

(cf.llm.prompt.custom_topic_categories["competitors"] lt 30)

Scope Rule to Specific Endpoint + Custom PII Format

(http.request.uri.path eq "/api/chat"
  and http.request.body.raw matches "EMP-[0-9]{6}")

Rate Limit High-Token Prompts (Cost Control)

(cf.llm.prompt.token_count gt 2000
  and http.request.uri.path eq "/api/chat")

Allow Financial PII Only from Internal Network

(cf.llm.pii.detected eq true
  and not ip.src in {10.0.0.0/8 172.16.0.0/12 192.168.0.0/16})

AI Gateway — Complementary Security Layer

AI Gateway is available on all Cloudflare plans at no cost. It sits between your app and LLM providers, adding security, observability, and performance features that the WAF layer cannot provide (e.g., response scanning, caching, model fallback).

🛡

Guardrails BETA

Content moderation on prompts AND model responses — block or flag by hazard category.

🔐

DLP BETA

Scan prompts and responses for PII, credentials, source code, jailbreak intent, financial data.

📍

Rate Limiting

Fixed or sliding window limits per gateway. Returns 429 when exceeded.

⚡

Caching

Cache identical prompt→response pairs. cf-aig-cache-status: HIT/MISS header.

🔃

Dynamic Routing

Model fallback and request retry — define alternate providers in JSON config.

🔒

Authenticated Gateway

Require cf-aig-authorization header — prevents direct bypass.

📝

Logging

Full prompt/response logging with conversation_id for audit trail reconstruction.

💸

Cost Analytics

Token counts and estimated costs per provider — visible in AI Gateway Analytics.

AI Gateway — Guardrails

Guardrails intercept and evaluate both user prompts and model responses for harmful content before they reach the user or LLM. The feature works as a proxy between your application and model providers.

Configuration Options

Setting	Description
Evaluation scope	User prompts only, model responses only, or both
Hazard categories	Select which categories to monitor — e.g., violence, hate, self-harm
Action per category	Block (return 400) or Flag (log but allow through)

Setup Steps

Dashboard → AI → AI Gateway → select your gateway → Settings
Enable Guardrails
Set evaluation scope: user prompts, model responses, or both
Select hazard categories to monitor and set action per category (block or flag)

AI Gateway — Data Loss Prevention (DLP)

AI Gateway DLP uses the same detection engine as Cloudflare's enterprise DLP product to scan AI traffic in real-time. It scans both incoming prompts and outgoing model responses.

DLP Detection Categories

Type	What It Detects
Content: PII	Names, SSNs, email addresses in prompts
Content: Credentials & Secrets	API keys, passwords, tokens, connection strings
Content: Source Code	Code snippets, algorithms, proprietary logic
Content: Customer Data	Customer names, projects, confidential business context
Content: Financial Information	Financial numbers, confidential business data
Intent: PII	Prompt requesting specific personal information about individuals
Intent: Code Abuse	Prompt requesting malicious code, exploits, or attack tools
Intent: Jailbreak	Prompt attempting to circumvent AI safety policies

✅

Bidirectional scanning: Enable DLP on both prompts and responses. LLM responses can leak PII from training data — scanning only prompts leaves you exposed on the response side.

AI Gateway — Setup

Create a Gateway

Dashboard → AI → AI Gateway → Create Gateway
Name your gateway (64 character limit)
Connect your application by routing AI provider calls through the gateway URL
Configure settings: Authentication, Rate Limiting, Caching, Guardrails, DLP

Enable DLP

Select your gateway → Firewall tab
Toggle Data Loss Prevention (DLP) to On
Add DLP policies — select detection entries and set action (block / log)

Enable Rate Limiting

Select your gateway → Settings
Enable Rate-limiting
Set rate, time period, and strategy (fixed window or sliding window)

Enable Caching

Select your gateway → Settings
Enable Cache Responses
Set default cache TTL. Override per-request by passing cf-aig-cache-ttl header

ℹ

Check cf-aig-cache-status: HIT or MISS in response headers to verify caching behavior. Currently caching applies only to identical requests with text or image responses.

OWASP Top 10 for LLMs — Coverage Map

OWASP LLM Risk	Cloudflare Feature	WAF Field / Tool
LLM01 Prompt Injection	AI Security for Apps: Injection Detection	`cf.llm.prompt.injection_score`
LLM02 Sensitive Info Disclosure	AI Security for Apps: PII Detection + AI Gateway DLP	`cf.llm.prompt.pii_categories` + DLP profiles
LLM06 Excessive Agency / Misuse	WAF Rate Limiting + AI Gateway Rate Limiting	Rate limiting rules + `cf.llm.prompt.token_count`
LLM08 Vector and Embedding Weaknesses	AI Gateway Guardrails (response scanning)	Guardrails hazard categories
LLM09 Misinformation / Unsafe Output	AI Security for Apps: Unsafe Topic Detection + Guardrails	`cf.llm.prompt.unsafe_topic_categories`
Jailbreak Policy Bypass	AI Gateway DLP: Intent: Jailbreak + Injection Score	DLP intent detection + injection score < 20

Recommended Deployment Workflow

1. Label endpoints

Apply cf-llm label via API Shield (auto-discovery + manual)

2. Enable Log Mode

Turn on AI Security Log Mode Ruleset — action = Log. Enable payload logging.

3. Review Security Analytics

Decrypt payloads, correlate prompts with scores. Understand baseline traffic patterns.

4. Define custom topics

Add org-specific topics via API using verb-phrase topic strings.

5. Build custom rules (Log action)

Create WAF Custom Rules with tuned thresholds. Keep on Log to validate.

6. Switch to Block

Once validated, change custom rule actions to Block. Disable Log Mode or keep for monitoring.

7. Set up AI Gateway

Add Guardrails, DLP (prompt + response), rate limiting, caching, and authenticated access.

8. Monitor and iterate

Continuously review Security Analytics. Adjust thresholds and topic strings as needed.

ℹ

Custom rules (evaluated earlier in the pipeline) run before the managed ruleset. Set custom rules to Log during the transition period to run both modes in parallel before committing to Block.

Optimization Best Practices

Security

Practice	Reason
Never block on `pii_detected` alone	Generates massive false positives — PERSON, DATE_TIME, LOCATION appear in normal conversation
Start injection threshold at `lt 30`	`lt 50` is too aggressive; tune based on log review before switching to Block
Use verb-phrase topic strings	"asking for financial advice" is far more precise than "financial advice" — avoids passive-mention false positives
Layer injection score + bot score + geo	Each signal alone may produce false positives; combined they identify high-confidence attack patterns
Enable bidirectional DLP	LLM responses can leak PII from training data — scanning only prompts leaves you exposed on the output side
Use Log Mode + payload logging first	Lets you see actual prompts alongside detection scores before enforcing blocking
Scope rules to specific URI paths	Avoids unnecessary scanning of non-LLM endpoints and reduces false positives
Avoid semantically overlapping topics	"financial advice" and "investment guidance" cover the same thing — wastes your 20-topic budget
Authenticate your AI Gateway	Require `cf-aig-authorization` header to prevent direct bypass of AI Gateway controls

Cost & Performance

Practice	Reason
Enable caching in AI Gateway	Identical prompts return cached responses — major cost savings for support bots with limited prompt options
Use `token_count` for rate limiting	High-token prompts are expensive — limit them to control LLM inference costs
Set rate limits at gateway level	Prevents runaway costs from abuse or application bugs
Use dynamic routing / model fallback	Increases resilience without manual intervention on provider downtime
Monitor token usage in analytics	AI Gateway tracks token counts and estimated costs per provider — identify expensive patterns early

Operational Visibility

Practice	Reason
Use `conversation_id` in logs	Reconstruct full interaction context during incident investigation — filter by ID in Gateway logs
Enable encrypted payload logging	Log full prompts securely — decrypt only when needed for forensic review, protecting user data at rest
Review Security Overview alerts	Suspicious AI traffic is automatically surfaced — set up alerts for anomalous spikes
Monitor and iterate continuously	Threat patterns and traffic baselines evolve — static thresholds degrade in precision over time

Documentation Links

🔗 AI Security for Apps Overview 🔗 Get Started Guide 🔗 Prompt Injection Detection 🔗 PII Detection 🔗 Unsafe Topic Detection 🔗 Example WAF Rules 🔗 Reference Architecture 🔗 AI Gateway Overview 🔗 AI Gateway Guardrails 🔗 AI Gateway DLP 🔗 AI Gateway Caching 🔗 AI Gateway Rate Limiting 🔗 Authenticated Gateway 🔗 Dynamic Routing / Fallback 🔗 OWASP Top 10 for LLMs