LLM Judge
The judge transform calls a large language model to produce an allow or deny decision for outbound HTTP requests. Each judge instance carries its own natural-language policy, LLM backend, and URL rules. You can deploy zero, one, or many judges with different prompts scoped to different destinations.
The judge is configured as a transform in your iron-proxy YAML config.
How It Works
Each judge instance runs in the transform pipeline with its own state:
- Rule matching: when a request matches the instance’s
rules, the judge runs. Non-matching requests pass through untouched, with no LLM call and no annotations. - Envelope construction: the request is serialized into a JSON envelope (method, URL, headers, body) with per-field size caps and priority ordering for security-relevant headers.
- LLM call: the envelope is sent to the configured provider along with a system prompt that embeds the operator’s policy. The system prompt instructs the model to return a bare JSON decision.
- Decision: the model returns
{"decision":"ALLOW", ...}or{"decision":"DENY", ...}. A deny short-circuits the pipeline with HTTP 403. An allow continues to the next transform. - Fallback: on LLM error, timeout, malformed output, or open circuit breaker, the configured
fallbackapplies (see Failure Handling).
Invariants
The judge is an additional rejection layer. It cannot override iron-proxy’s other controls:
- The judge can only reject. It never approves a request that the static allowlist would have denied. The allowlist’s deny always wins.
- Non-matching requests are ignored. If a request does not match the instance’s
rules, no LLM call is made and no audit annotations are written. - Default-deny semantics are preserved. If an instance skips on failure, the rest of the pipeline still applies, and unmatched requests are still blocked by the allowlist.
Pipeline Ordering
The relative position of the judge and secrets transforms in your configuration determines what the LLM provider sees.
Recommended: place the judge before the secrets transform. The LLM provider sees proxy tokens, never the real credentials the workload has access to. This is the safer default and is the only placement compatible with typical threat models.
transforms:
- name: allowlist
config: {...}
- name: judge # runs first: LLM sees proxy tokens only
config: {...}
- name: secrets # real credentials injected here
config: {...}Alternative: place the judge after secrets. The judge evaluates the exact wire form that will egress, including any injected credentials. Only choose this if your threat model accepts sending real secrets to the LLM provider.
Configuration
A judge instance is a single entry under transforms:. See the configuration reference for the full schema.
transforms:
- name: judge
config:
name: "github-write-guard"
fallback: "deny"
timeout: "8s"
max_concurrent: 100
circuit_breaker:
consecutive_failures: 5
cooldown: "10s"
rules:
- host: "api.github.com"
methods: ["POST", "PATCH", "DELETE", "PUT"]
provider:
type: "anthropic"
model: "claude-haiku-4-5-20251001"
api_key_env: "ANTHROPIC_API_KEY"
max_tokens: 256
prompt: |
This agent performs code review on the repository under review.
Allow writes to the comments and reviews endpoints of the specific
repository under review. Deny writes to user settings, organization
management, billing, or any repository the agent is not reviewing.Writing a Good Policy
The prompt field is a natural-language description of what is allowed. A few guidelines:
- Keep it short and specific. A focused policy produces more consistent decisions than a long one. Aim for a paragraph or two.
- Scope the judge’s rules to the smallest set of destinations the policy covers. A judge whose policy only mentions GitHub should have rules that only match GitHub. Everything else should be handled by the static allowlist or other judges.
- State both what is allowed and what is not. Positive and negative examples help the model resolve ambiguity.
- Default to deny in the policy itself. If the policy leaves a case undefined, the system prompt instructs the model to prefer DENY, but being explicit reduces guesswork.
The operator policy is JSON-escaped before being embedded in the system prompt, so quotes, braces, and newlines in your policy are safe. Prompt-injection-shaped text inside the policy is treated as data, not instructions.
Providers
The judge supports two LLM backends. Both use Anthropic’s or OpenAI’s public APIs and expect an API key supplied via an environment variable on the iron-proxy process.
Anthropic (type: anthropic)
Calls the Messages API at https://api.anthropic.com/v1/messages.
provider:
type: "anthropic"
model: "claude-haiku-4-5-20251001"
api_key_env: "ANTHROPIC_API_KEY"
max_tokens: 256| Field | Type | Default | Description |
|---|---|---|---|
type | string | required | Must be anthropic. |
model | string | required | Anthropic model ID (e.g., claude-haiku-4-5-20251001). |
api_key_env | string | required | Name of the environment variable holding the Anthropic API key. |
base_url | string | https://api.anthropic.com | Override the API base URL. Useful for testing or gateway deployments. |
max_tokens | integer | 256 | Maximum tokens in the model response. |
OpenAI (type: openai)
Calls the Chat Completions API at https://api.openai.com/v1/chat/completions.
provider:
type: "openai"
model: "gpt-5.4-nano"
api_key_env: "OPENAI_API_KEY"
max_tokens: 256| Field | Type | Default | Description |
|---|---|---|---|
type | string | required | Must be openai. |
model | string | required | OpenAI model ID (e.g., gpt-5.4-nano). |
api_key_env | string | required | Name of the environment variable holding the OpenAI API key. |
base_url | string | https://api.openai.com | Override the API base URL. Useful for Azure OpenAI or gateway deployments. |
max_tokens | integer | 256 | Maximum tokens in the model response. Sent as max_completion_tokens. |
Failure Handling
Every judge instance has its own timeout, semaphore, and circuit breaker. A failing judge never blocks an unrelated judge.
Timeout
The timeout field bounds a single LLM call. On timeout, the call is canceled and the configured fallback applies.
Concurrency
The max_concurrent field caps the number of in-flight LLM calls for this instance. Additional requests wait for a slot. This protects the proxy from runaway concurrency against a slow LLM endpoint.
Circuit Breaker
Each instance has an independent consecutive-failure breaker:
- After
consecutive_failureserrors in a row, the breaker opens and short-circuits subsequent calls forcooldown. During this window, the fallback applies without an LLM call. - After the cooldown elapses, the breaker enters a half-open state and admits a single probe call. Success closes the breaker; failure reopens it with a fresh cooldown.
- A single successful call resets the failure counter, so transient errors do not slowly accumulate.
Fallback
The fallback field determines what happens when the LLM call fails or returns something unusable. Two modes are supported:
deny(default, recommended for production): the request is rejected with HTTP 403. Safe under any failure mode.skip: the judge yields to the rest of the pipeline. Since iron-proxy is default-deny, unmatched requests are still blocked, but requests that the static allowlist would accept will no longer be gated by this judge.
A fallback fires on:
- LLM request error (network, 4xx, 5xx).
- LLM call timeout.
- Circuit breaker open.
- Malformed model output (not valid JSON, or
decisionnotALLOW/DENY). - Errors reading the request body or building the envelope.
The allow fallback does not exist: the judge cannot upgrade a failure into a successful allow. Pick deny when the judge is a hard gate and skip only when the judge is advisory and the underlying allowlist already enforces safety.
Envelope Limits
Before the request is sent to the LLM, it is serialized into a JSON envelope with per-field size caps. Content beyond the cap is dropped and a warning is added so the model can see that truncation happened. The model is instructed to prefer DENY when truncation warnings could plausibly matter.
| Field | Cap | Notes |
|---|---|---|
| Body | 16 KiB | Non-UTF-8 bodies are omitted entirely with a warning. Multipart bodies beyond the cap emit a placeholder summary. |
| URL | 2 KiB | Truncated with a warning showing the original length. |
| Headers (total) | 4 KiB | Security-relevant headers (Host, Origin, Referer, X-Forwarded-For, X-Forwarded-Host, Content-Type, Content-Length, Content-Encoding, Transfer-Encoding, Authorization, Cookie) are emitted first, then the rest in alphabetical order. |
| Header value | 512 bytes | Values longer than this are truncated with a marker noting the original length. |
The priority-header ordering defeats header-inflation attacks: even when an attacker packs the envelope with junk headers, the security-relevant ones are always visible to the model.
Audit Output
Every matched request adds structured fields under the transform trace:
| Field | Description |
|---|---|
judge.instance | The instance name. Use this to disambiguate between multiple judges. |
judge.model | The model ID used for the decision. |
judge.decision | ALLOW, DENY, FALLBACK_ALLOW, or FALLBACK_DENY. |
judge.reason | Short justification. For successful calls, comes from the model. For fallbacks, describes the failure. Capped at 512 characters. |
judge.duration_ms | Total time spent in the judge, in milliseconds, including the LLM call. |
judge.input_tokens | Tokens the provider billed for the prompt. Present only when the LLM call succeeded. |
judge.output_tokens | Tokens the provider billed for the response. Present only when the LLM call succeeded. |
judge.fallback_applied | Present only when a fallback fired. One of deny or skip. |
judge.circuit_breaker_tripped | Present and set to true only when the breaker was open. |
judge.raw_output | Present only when decision parsing failed. Contains the first 2 KiB of the raw model output for debugging. |
Multiple Instances
You can define many judge transforms in a single configuration. Each entry is an independent instance: its own name, prompt, rules, provider, semaphore, and circuit breaker. A failing or slow judge does not affect the others.
transforms:
- name: judge
config:
name: "github-write-guard"
rules:
- host: "api.github.com"
methods: ["POST", "PATCH", "DELETE", "PUT"]
provider:
type: "anthropic"
model: "claude-haiku-4-5-20251001"
api_key_env: "ANTHROPIC_API_KEY"
prompt: |
Allow writes to the repository under review. Deny writes to user
settings, billing, or any other repository.
- name: judge
config:
name: "slack-dm-guard"
rules:
- host: "slack.com"
paths: ["/api/chat.postMessage"]
provider:
type: "openai"
model: "gpt-5.4-nano"
api_key_env: "OPENAI_API_KEY"
prompt: |
Allow posts to the #release-bot channel. Deny posts to any other
channel or direct messages.A single request may match at most one judge per instance, and each matched instance runs independently. If two instances both match, both make an LLM call and both must allow.