Firewall for AI & LLMs
Complete reference: WAF-integrated LLM threat detection for prompt injection, PII exposure, and unsafe topics — plus AI Gateway security features including Guardrails, DLP, rate limiting, and caching.
What It Is
AI Security for Apps extends the Cloudflare WAF with detections specifically designed for LLM-powered applications. It is model-agnostic — it works regardless of which LLM you use (OpenAI, Anthropic, Google Gemini, Workers AI, self-hosted, etc.).
It is complemented by AI Gateway, a proxy layer (available on all plans) that adds Guardrails, DLP, rate limiting, caching, and full prompt/response logging between your app and the LLM provider.
Prompt Injection
Detect attackers trying to hijack the LLM's behavior by overriding its system instructions or extracting the system prompt.
PII Detection
Catch users inadvertently or maliciously sending sensitive personal data (SSNs, credit cards, emails) in prompts.
Unsafe Topics
Block prompts covering harmful subjects — violent crimes, hate speech, self-harm, CSAM, weapons — across 14 categories.
Custom Topics
Define up to 20 organization-specific topics — competitors, legal advice, internal HR matters — using zero-shot classification.
Plan Availability
| Capability | Free | Pro | Business | Enterprise |
|---|---|---|---|---|
LLM endpoint discovery (cf-llm auto-label) |
Yes | Yes | Yes | Yes |
| AI Security Log Mode Ruleset (full prompt logging) | No | No | No | Paid add-on |
| AI detection fields — PII, injection score, unsafe topics, custom topics | No | No | No | Paid add-on |
| AI Gateway (rate limiting, caching, logging, Guardrails, DLP) | Yes | Yes | Yes | Yes |
Architecture & Traffic Flow
Cloudflare's AI security is a layered stack. Requests pass through WAF-level LLM detection first, then optionally through AI Gateway before reaching the LLM provider.
LLM Endpoint Discovery
Cloudflare automatically detects LLM endpoints using traffic heuristics — no manual
configuration required. Once detected, endpoints are labeled cf-llm via
API Shield, enabling filtering in Security Analytics and scoping of WAF rules.
How Heuristics Work
| Signal | Detail |
|---|---|
| Response time | LLM endpoints typically respond in >1 second |
| Effective bitrate | 80% of LLM endpoints operate at <4 KB/s (streaming tokens) |
| False positive filtering | GraphQL endpoints, device heartbeats, QR/OTP generators are filtered out automatically |
cf-llm label to specific endpoints
via Security > Web Assets > Endpoints > Edit endpoint labels, or bulk
apply via API Shield's endpoint management API.Content-Type:
application/json. Non-JSON LLM requests are not scanned.PII Detection
Two complementary approaches that can be combined for layered protection:
Fuzzy Detection (AI-powered)
Uses Microsoft Presidio to detect PII even in natural language or unexpected formats. Supports 40+ categories:
cf.llm.prompt.pii_detected alone.
Broad categories like PERSON, DATE_TIME, and LOCATION
appear in normal conversation and will generate large numbers of false positives. Always filter
by specific categories using cf.llm.prompt.pii_categories.Exact Detection (Regex)
Use WAF custom rules with http.request.body.raw matches "PATTERN" for
organization-specific formats:
| Custom PII type | Example format | Regex pattern |
|---|---|---|
| Employee ID | EMP-482910 |
EMP-[0-9]{6} |
| Patient record number | PAT/2024/00391 |
PAT/[0-9]{4}/[0-9]{5} |
| Internal account ID | ACCT-XX-99999 |
ACCT-[A-Z]{2}-[0-9]{5} |
| Custom API key prefix | sk_live_abc123... |
sk_live_[a-zA-Z0-9]{20,} |
Prompt Injection Detection
Score-based system using the cf.llm.prompt.injection_score field. Range: 1–99.
Lower score = higher injection risk.
| Score range | Risk level | Recommended action |
|---|---|---|
| 1 – 19 | High — strongly resembles known injection patterns | Block |
| 20 – 49 | Moderate — some injection characteristics, may be ambiguous | Challenge or Log |
| 50 – 99 | Low — likely safe, normal user input | Allow |
A score-based approach is used rather than a binary result because injection exists on a spectrum. A creative writing request may superficially resemble an injection attempt without actually being one.
lt 40. Review results in
Security Analytics, then tune down to lt 30 or lt 20 based on actual
false positive rates before switching to Block.Unsafe & Custom Topic Detection
Predefined Unsafe Topics (14 categories)
| Category | Description |
|---|---|
S1 |
Violent crimes |
S2 |
Non-violent crimes |
S3 |
Sex-related crimes |
S4 |
Child sexual exploitation |
S5 |
Defamation |
S6 |
Specialized advice |
S7 |
Privacy |
S8 |
Intellectual property |
S9 |
Indiscriminate weapons |
S10 |
Hate |
S11 |
Suicide and self-harm |
S12 |
Sexual content |
S13 |
Elections |
S14 |
Code interpreter abuse |
Custom Topic Detection
Define up to 20 custom topics using zero-shot classification — no model training required. Each topic has a label (used in rules) and a topic string (used by the AI classifier). Scores follow the 1–99 scale (lower = more relevant to the topic).
Constraints
| Parameter | Limit |
|---|---|
| Maximum number of topics | 20 |
| Topic string length | 2–50 printable ASCII characters |
| Label length | 2–20 characters |
| Label format | Lowercase letters, numbers, and hyphens only |
Define Custom Topics via API
curl "https://api.cloudflare.com/client/v4/zones/$ZONE_ID/firewall-for-ai/custom_topics" \
--request PUT \
--header "Authorization: Bearer $CLOUDFLARE_API_TOKEN" \
--json '{
"topics": [
{ "label": "competitors", "topic": "asking about competitor products and pricing" },
{ "label": "legal-advice", "topic": "asking for legal counsel or regulatory guidance" },
{ "label": "hr-internal", "topic": "internal HR policies and employee matters" }
]
}'
PUT request replaces your entire topic
list. Always include all topics you want to keep, not just new ones.Topic String Best Practices
| Style | Example | Verdict |
|---|---|---|
| Verb phrase (recommended) | asking for investment advice | Best precision |
| Sentence-like | a user seeking financial guidance | Good |
| Noun phrase | investment advice | Acceptable |
| Single keyword | finance | Too broad |
| Vague phrase | bad things | Ineffective |
Example Custom Topics
| Label | Topic string | Use case |
|---|---|---|
competitors |
asking about Acme Corp products and pricing | Block chatbot discussing rival offerings |
legal-advice |
asking for legal counsel or regulatory compliance guidance | Block prompts soliciting legal advice |
student-data |
requesting student personal information or academic records | EdTech — prevent student data exposure |
crypto-advice |
asking for cryptocurrency trading or investment recommendations | FinTech — block crypto investment tips |
exec-internal |
discussing internal executive decisions or leadership changes | Prevent internal matter leakage |
Detection Fields Reference
These fields are populated by AI Security for Apps on requests hitting cf-llm labeled
endpoints and can be used in WAF Custom Rules and Rate Limiting Rules.
| Field | Type | Description |
|---|---|---|
cf.llm.prompt.detected |
Boolean | LLM prompt was detected in the request |
cf.llm.prompt.pii_detected |
Boolean | Any PII found in the prompt — do not block on this alone |
cf.llm.prompt.pii_categories |
Array<String> | PII types found (CREDIT_CARD, US_SSN, EMAIL_ADDRESS, etc.) |
cf.llm.prompt.injection_score |
Number (1–99) | Injection likelihood — lower = more dangerous |
cf.llm.prompt.unsafe_topic_detected |
Boolean | Any predefined unsafe topic detected in prompt |
cf.llm.prompt.unsafe_topic_categories |
Array<String> | Which unsafe categories detected (S1–S14) |
cf.llm.prompt.custom_topic_categories |
Map<Number> | Custom topic relevance scores by label (1–99, lower = more relevant) |
cf.llm.prompt.token_count |
Number | Estimated token count of the prompt — useful for cost-based rate limiting |
Log Mode vs Production Mode
| Feature | Log Mode | Production Mode |
|---|---|---|
| How it works | Pre-built managed ruleset | Custom WAF rules using detection fields |
| Prompt logging | Yes (encrypted payload logging) | No — metadata only |
| Response logging | No | No — use AI Gateway |
| Policy flexibility | Limited — 3 fixed rules | Full — scores, categories, combined signals |
| Blocking behavior | Default WAF block page | Fully customizable responses |
| Best for | Evaluation and threshold tuning | Production enforcement |
Enable Log Mode via API
curl "https://api.cloudflare.com/client/v4/zones/$ZONE_ID/rulesets/phases/http_request_firewall_managed/entrypoint" \
--request PUT \
--header "Authorization: Bearer $CLOUDFLARE_API_TOKEN" \
--json '{
"rules": [{
"action": "execute",
"action_parameters": { "id": "b7cd52df92f74c848cec0c2ed385e336" },
"expression": "true"
}]
}'
Enable via Dashboard
Security > Settings > AI Security for Apps > Managed Ruleset > Enable
Action: Log
Configure payload logging to allow decryption of prompts in Security Analytics
Setup Steps — AI Security for Apps (WAF)
-
Enable AI Security for Apps Dashboard: Security > Settings > filter by "Detection tools" > Toggle AI Security for Apps On.
Or via API:curl "https://api.cloudflare.com/client/v4/zones/$ZONE_ID/firewall-for-ai/settings" \ --request PUT \ --header "Authorization: Bearer $CLOUDFLARE_API_TOKEN" \ --json '{ "pii_detection_enabled": true }' -
Label LLM endpoints with cf-llm Auto-discovery handles this in most cases. To manually apply: Security > Web Assets > Endpoints > Edit endpoint labels >
cf-llm. -
Enable the AI Security Log Mode Ruleset Turn on with action = Log. Enable payload logging so you can decrypt and read actual prompts in Security Analytics.
-
Review detections in Security Analytics Filter by the
cf-llmlabel. Decrypt payloads. Note injection scores, PII categories, and unsafe topic rates across your traffic baseline. -
Define custom topics (if needed) Use the dashboard or API to define up to 20 custom topics. Use verb-phrase topic strings for best precision.
-
Build production custom rules Create WAF Custom Rules using
cf.llm.*fields with tuned thresholds. Start with Log action to validate before blocking. -
Switch to Block and iterate Once rules are validated, change actions to Block. Monitor continuously and adjust thresholds as traffic patterns evolve.
Example WAF Rules
Block High-Confidence Prompt Injection
(cf.llm.prompt.injection_score lt 20)
Challenge Moderate-Risk Injection
(cf.llm.prompt.injection_score lt 40)
Block Specific PII Categories Only
(any(cf.llm.prompt.pii_categories[*] in {"CREDIT_CARD" "US_SSN"}))
Log Emails — Block Credit Cards and SSNs
Rule 1 — Block:
(any(cf.llm.prompt.pii_categories[*] in {"CREDIT_CARD" "US_SSN"}))
Rule 2 — Log:
(any(cf.llm.prompt.pii_categories[*] in {"EMAIL_ADDRESS"}))
Block Specific Unsafe Topics
(any(cf.llm.prompt.unsafe_topic_categories[*] in {"S1" "S10"}))
Layered — Injection + Bot Score + Geo (Low False Positives)
(cf.llm.prompt.injection_score lt 25
and cf.bot_management.score lt 10
and ip.geoip.country ne "US")
Block Injection from Automated Sources
(cf.llm.prompt.injection_score lt 30 and cf.bot_management.score lt 20)
Combined — Injection + PII (Common Attack Pattern)
(cf.llm.prompt.injection_score lt 40 and cf.llm.prompt.pii_detected)
Block Custom Topic (Competitors)
(cf.llm.prompt.custom_topic_categories["competitors"] lt 30)
Scope Rule to Specific Endpoint + Custom PII Format
(http.request.uri.path eq "/api/chat"
and http.request.body.raw matches "EMP-[0-9]{6}")
Rate Limit High-Token Prompts (Cost Control)
(cf.llm.prompt.token_count gt 2000
and http.request.uri.path eq "/api/chat")
Allow Financial PII Only from Internal Network
(cf.llm.pii.detected eq true
and not ip.src in {10.0.0.0/8 172.16.0.0/12 192.168.0.0/16})
AI Gateway — Complementary Security Layer
AI Gateway is available on all Cloudflare plans at no cost. It sits between your app and LLM providers, adding security, observability, and performance features that the WAF layer cannot provide (e.g., response scanning, caching, model fallback).
Guardrails BETA
Content moderation on prompts AND model responses — block or flag by hazard category.
DLP BETA
Scan prompts and responses for PII, credentials, source code, jailbreak intent, financial data.
Rate Limiting
Fixed or sliding window limits per gateway. Returns 429 when exceeded.
Caching
Cache identical prompt→response pairs. cf-aig-cache-status: HIT/MISS
header.
Dynamic Routing
Model fallback and request retry — define alternate providers in JSON config.
Authenticated Gateway
Require cf-aig-authorization header — prevents direct bypass.
Logging
Full prompt/response logging with conversation_id for audit trail
reconstruction.
Cost Analytics
Token counts and estimated costs per provider — visible in AI Gateway Analytics.
AI Gateway — Guardrails
Guardrails intercept and evaluate both user prompts and model responses for harmful content before they reach the user or LLM. The feature works as a proxy between your application and model providers.
Configuration Options
| Setting | Description |
|---|---|
| Evaluation scope | User prompts only, model responses only, or both |
| Hazard categories | Select which categories to monitor — e.g., violence, hate, self-harm |
| Action per category | Block (return 400) or Flag (log but allow through) |
Setup Steps
-
Dashboard → AI → AI Gateway → select your gateway → Settings
-
Enable Guardrails
-
Set evaluation scope: user prompts, model responses, or both
-
Select hazard categories to monitor and set action per category (block or flag)
AI Gateway — Data Loss Prevention (DLP)
AI Gateway DLP uses the same detection engine as Cloudflare's enterprise DLP product to scan AI traffic in real-time. It scans both incoming prompts and outgoing model responses.
DLP Detection Categories
| Type | What It Detects |
|---|---|
| Content: PII | Names, SSNs, email addresses in prompts |
| Content: Credentials & Secrets | API keys, passwords, tokens, connection strings |
| Content: Source Code | Code snippets, algorithms, proprietary logic |
| Content: Customer Data | Customer names, projects, confidential business context |
| Content: Financial Information | Financial numbers, confidential business data |
| Intent: PII | Prompt requesting specific personal information about individuals |
| Intent: Code Abuse | Prompt requesting malicious code, exploits, or attack tools |
| Intent: Jailbreak | Prompt attempting to circumvent AI safety policies |
AI Gateway — Setup
Create a Gateway
-
Dashboard → AI → AI Gateway → Create Gateway
-
Name your gateway (64 character limit)
-
Connect your application by routing AI provider calls through the gateway URL
-
Configure settings: Authentication, Rate Limiting, Caching, Guardrails, DLP
Enable DLP
-
Select your gateway → Firewall tab
-
Toggle Data Loss Prevention (DLP) to On
-
Add DLP policies — select detection entries and set action (block / log)
Enable Rate Limiting
-
Select your gateway → Settings
-
Enable Rate-limiting
-
Set rate, time period, and strategy (fixed window or sliding window)
Enable Caching
-
Select your gateway → Settings
-
Enable Cache Responses
-
Set default cache TTL. Override per-request by passing
cf-aig-cache-ttlheader
cf-aig-cache-status: HIT or MISS in response
headers to verify caching behavior. Currently caching applies only to identical requests with
text or image responses.OWASP Top 10 for LLMs — Coverage Map
| OWASP LLM Risk | Cloudflare Feature | WAF Field / Tool |
|---|---|---|
| LLM01 Prompt Injection | AI Security for Apps: Injection Detection | cf.llm.prompt.injection_score |
| LLM02 Sensitive Info Disclosure | AI Security for Apps: PII Detection + AI Gateway DLP | cf.llm.prompt.pii_categories + DLP profiles |
| LLM06 Excessive Agency / Misuse | WAF Rate Limiting + AI Gateway Rate Limiting | Rate limiting rules + cf.llm.prompt.token_count |
| LLM08 Vector and Embedding Weaknesses | AI Gateway Guardrails (response scanning) | Guardrails hazard categories |
| LLM09 Misinformation / Unsafe Output | AI Security for Apps: Unsafe Topic Detection + Guardrails | cf.llm.prompt.unsafe_topic_categories |
| Jailbreak Policy Bypass | AI Gateway DLP: Intent: Jailbreak + Injection Score | DLP intent detection + injection score < 20 |
Recommended Deployment Workflow
cf-llm label via API Shield (auto-discovery + manual)
Optimization Best Practices
Security
| Practice | Reason |
|---|---|
Never block on pii_detected alone |
Generates massive false positives — PERSON, DATE_TIME, LOCATION appear in normal conversation |
Start injection threshold at lt 30 |
lt 50 is too aggressive; tune based on log review before switching to
Block |
| Use verb-phrase topic strings | "asking for financial advice" is far more precise than "financial advice" — avoids passive-mention false positives |
| Layer injection score + bot score + geo | Each signal alone may produce false positives; combined they identify high-confidence attack patterns |
| Enable bidirectional DLP | LLM responses can leak PII from training data — scanning only prompts leaves you exposed on the output side |
| Use Log Mode + payload logging first | Lets you see actual prompts alongside detection scores before enforcing blocking |
| Scope rules to specific URI paths | Avoids unnecessary scanning of non-LLM endpoints and reduces false positives |
| Avoid semantically overlapping topics | "financial advice" and "investment guidance" cover the same thing — wastes your 20-topic budget |
| Authenticate your AI Gateway | Require cf-aig-authorization header to prevent direct bypass of AI
Gateway controls |
Cost & Performance
| Practice | Reason |
|---|---|
| Enable caching in AI Gateway | Identical prompts return cached responses — major cost savings for support bots with limited prompt options |
Use token_count for rate limiting |
High-token prompts are expensive — limit them to control LLM inference costs |
| Set rate limits at gateway level | Prevents runaway costs from abuse or application bugs |
| Use dynamic routing / model fallback | Increases resilience without manual intervention on provider downtime |
| Monitor token usage in analytics | AI Gateway tracks token counts and estimated costs per provider — identify expensive patterns early |
Operational Visibility
| Practice | Reason |
|---|---|
Use conversation_id in logs |
Reconstruct full interaction context during incident investigation — filter by ID in Gateway logs |
| Enable encrypted payload logging | Log full prompts securely — decrypt only when needed for forensic review, protecting user data at rest |
| Review Security Overview alerts | Suspicious AI traffic is automatically surfaced — set up alerts for anomalous spikes |
| Monitor and iterate continuously | Threat patterns and traffic baselines evolve — static thresholds degrade in precision over time |