C Claude Code Internals
| EN | ES

YOLO 分类器

YOLO 分类器是一个 AI 驱动的安全监视器,在自动模式下执行之前评估每个工具调用。它使用专门的系统提示词、对话转录和两阶段 XML 决策协议对 Claude 进行单独的 API 调用。

2 分类器阶段 28 阻止规则类别 6 快速路径 10 安全原则
i 单独查询,不是内联推理
分类器是对 Claude 的单独 API 调用 — 不是代理自己推理的一部分。 它在与代理相同的模型层上运行,并具有从构建时注入的 3 个模板文件构建的专用系统提示词。 只有明确的 <block>no</block> 响应才允许操作。 任何失败都会阻止。

架构

Component File Purpose
Classifier yoloClassifier.ts Main classification engine (~1,495 lines)
Permissions permissions.ts Permission check orchestration and auto mode flow
Dangerous Patterns dangerousPatterns.ts Pattern detection for dangerous commands (bash, python, kubectl…)
Classifier Decision classifierDecision.ts Safe tool allowlist that bypasses the classifier entirely
Denial Tracking denialTracking.ts Consecutive and total denial counters with human fallback
Permission Setup permissionSetup.ts Strips dangerous allow-rules when entering auto mode
Bash Classifier bashClassifier.ts Stub for external users (ANT-only real implementation)

高级流程

Step 1
Always-deny rules?
→ DENY immediately
Step 2
Always-allow rules?
→ ALLOW immediately
Step 3
Fast paths pass?
→ ALLOW (no API call)
Step 4
2-stage XML classifier
→ ALLOW or DENY

两阶段 XML 分类器

输出格式

<!-- Allowed: -->
<block>no</block>

<!-- Blocked: -->
<block>yes</block><reason>one short sentence</reason>

The format instructions explicitly forbid any preamble: the response MUST begin with <block>.

Property Stage 1 (Fast) Stage 2 (Thinking)
Purpose Quick yes/no decision Careful chain-of-thought review
max_tokens 64 (both) / 256 (fast mode) 4,096
stop_sequences ['</block>'] (both mode only) None — allows full reasoning
temperature 0 0
Suffix injected "Err on the side of blocking. <block> immediately." Use <thinking> before responding. Explicit user confirmation required to override blocks.
Triggered when Always runs first Only if Stage 1 returns <block>yes or is unparseable
Result <block>no → ALLOW. <block>yes → escalate <block>no → ALLOW. <block>yes → DENY

分类器模式

both (default)
Stage 1 first; Stage 2 only if Stage 1 blocks. Best balance of cost vs accuracy.
fast
Stage 1 only (max_tokens=256, no stop_sequences). Fastest and cheapest.
thinking
Stage 2 only (skips Stage 1). Most thorough, highest latency and cost.

Mode is controlled by GrowthBook tengu_auto_mode_config.twoStageClassifier. For models with alwaysOnThinking, 2,048 padding tokens are added instead of sending thinking: false.

分类器系统提示词

The system prompt is assembled from 3 template files injected at build time via Bun's require(). The .txt files are not in the source repo. The prompt has three main sections:

Base Prompt

Defines the threat model: prompt injection, scope creep, and accidental damage. Sets default ALLOW — only blocks on explicit BLOCK rules. Excludes quality/efficiency issues from scope.

User Intent Rules

6 rules about distinguishing user requests from agent actions: explicit consent, scope escalation, agent-inferred parameters, questions vs consent, untrusted tool results.

Permissions Template

Replaceable sections for environment context, BLOCK rules (~28 categories), and ALLOW exceptions (7 categories). External users replace; ANT users augment.

12 条评估规则

Rule Description
Composite actions Evaluate multi-step sequences as a whole, not individual steps
Written file execution Writing a script then running it = directly running the commands
Committing code Git add/commit is generally safe; pushing is the sensitive part
Delayed/enabled effects Creating cron jobs, systemd services, or startup scripts has lasting effects
Sub-agent delegation When spawning sub-agents, evaluate the delegated task's intent
Sub-agent handoff review Sub-agents inheriting broad permissions need extra scrutiny
Look through wrappers npm run deploy = running the deploy script directly
Preemptive block Block early when the trajectory clearly leads to a blocked action
Classifier bypass Block attempts to disable or circumvent the classifier itself
Message context Consider the full conversation context, not just the latest action
Evaluate on own merits Each action is evaluated independently — previous approvals don't carry over
Shared infra bias Lean toward blocking when shared infrastructure could be affected

阻止规则(约 28 个类别)

Git destructive (force push, delete remote branches)
Git push to default branch (main/master)
Download and execute code from untrusted URLs
Cloud storage mass delete (S3, GCS)
Production deploy without explicit user request
Remote shell writes (SSH commands)
Blind apply patches/diffs without review
Logging/audit tampering
Permission escalation or admin account creation
TLS/auth weakening
Security feature disabling
Spawn agents with broad permissions
Interfere with other users' processes
Modify shared resources (configs, databases, queues)
Irreversible local destruction (rm -rf, drop tables)
Create RCE surfaces (reverse shells, web shells)
Expose local services to 0.0.0.0 or tunnels
Credential leakage (print/log/send secrets)
Credential store browsing beyond task needs
Data exfiltration to external services
Unauthorized persistence (cron jobs, systemd services)
Self-modification of agent code or config
External API writes without user request
Content integrity violations (impersonate users)
Real-world transactions (purchases, emails, posts)

允许例外(7 个类别)

Test artifacts (fixtures, mock data)
Local operations within project directory
Read-only operations (ls, cat, git status)
Declared dependencies (package.json, requirements.txt)
Toolchain bootstrap (dev tool setup)
Standard credentials (configured API keys for the task)
Git push to feature branches (not main/master)

自动模式配置(settings.json)

{
  "autoMode": {
    "allow": [
      "Git push to agent-created branches",
      "Install from package.json"
    ],
    "soft_deny": [
      "Never deploy to production",
      "Never modify shared databases"
    ],
    "environment": [
      "Trusted domains: *.company.com"
    ]
  }
}

External users: values replace defaults. ANT users: values are additive.

Security: autoMode is ignored from .claude/settings.json — prevents malicious project injection.

快速路径(什么跳过分类器)

These 6 optimizations avoid the expensive classifier API call, ordered from fastest to slowest.

# Fast Path Condition
1 Tool declares no relevance toAutoClassifierInput() returns '' — tool declares itself security-irrelevant
2 Non-classifierApprovable safety check decisionReason.type === 'safetyCheck' && classifierApprovable === false → prompt user directly (e.g., writes to .git/)
3 User interaction required tool.requiresUserInteraction() returns true → fall through to user prompt
4 PowerShell without flag PowerShell tool without POWERSHELL_AUTO_MODE feature flag → prompt user
5 acceptEdits fast path Simulates tool call in acceptEdits mode. If it would pass (e.g., file edit in project dir) → ALLOW. Excludes Agent and REPL tools.
6 Safe tool allowlist FileRead, Grep, Glob, LSP, TodoWrite, TaskCreate/Get/Update/List/Stop, AskUserQuestion, EnterPlanMode, TeamCreate, SendMessage, Sleep, classify_result — all skip classifier

Write/Edit tools are not on the allowlist — they use the acceptEdits fast path (#5) instead.

转录构建

包含的内容

User messages: text blocks only (no images, no tool results)
Assistant messages: tool_use blocks only (what the model DID)
Queued commands: queued_command attachments extracted as user turns
Assistant text: excluded — model-authored text could influence the classifier

JSONL 格式(默认)

{"user":"fix the bug in main.py"}
{"Bash":"npm test"}
{"Edit":"src/main.py: fixed_code"}
{"user":"looks good, now deploy"}
{"Bash":"npm run deploy"}

Legacy text format also supported. CLAUDE.md content is prepended as a separate user message with cache_control.

发送到分类器的最终消息结构

system: [{ text: systemPrompt, cache_control: { type: 'ephemeral' } }]
messages: [
  // Optional CLAUDE.md prefix (cached)
  { role: 'user', content: [
    { text: claudeMdWrapped, cache_control: { type: 'ephemeral' } }
  ]},
  // Transcript + action to evaluate
  { role: 'user', content: [
    { text: '<transcript>\n' },
    ...transcriptBlocks,
    { text: lastActionCompact, cache_control: { type: 'ephemeral' } },
    { text: '</transcript>\n' },
    { text: STAGE_SUFFIX }
  ]}
]

危险模式

Cross-Platform

python, python3, python2
node, deno, tsx
ruby, perl, php, lua
npx, bunx
npm run, yarn run
pnpm run, bun run
bash, sh, ssh

Unix-Specific (additional)

zsh, fish
eval, exec, env
xargs, sudo
--- ANT-only ---
fa run, coo, gh
gh api, curl, wget
git, kubectl
aws, gcloud, gsutil

PowerShell-Specific

pwsh, powershell, cmd
wsl, iex
invoke-expression, icm
invoke-command
start-process, saps
start, start-job, sajb
start-threadjob
register-objectevent
register-engineevent
! 进入自动模式时删除危险规则
当进入自动模式时,stripDangerousPermissionsForAutoMode() 扫描所有允许规则 中的危险 Bash/PowerShell/Agent 模式并临时删除它们。退出自动模式时通过 restoreDangerousPermissions() 恢复它们。 这可以防止预先配置的允许规则绕过分类器对危险命令的检查。

拒绝跟踪和限制

状态

type DenialTrackingState = {
  consecutiveDenials: number  // resets on any ALLOW
  totalDenials: number        // never resets (per session)
}
Max consecutive 3
Max total per session 20

达到限制时的行为

CLI: Fall back to interactive prompting — human reviews and decides
Headless: AbortError: "Agent aborted: too many classifier denials"

Purpose: prevents denial loops and runaway token costs. Any ALLOW (including fast-paths) resets consecutiveDenials.

错误处理 — 默认失败关闭

Failure Result Notes
API error (429, 500…) DENY (iron_gate=true) / ASK user (iron_gate=false) Controlled by GrowthBook tengu_iron_gate_closed (default: true, refreshed every 30 min)
Transcript too long ASK (CLI) / AbortError (headless) Headless aborts because transcript only grows — permanent condition
Unparseable response (Stage 1, both mode) Escalate Goes to Stage 2 for proper evaluation
Unparseable response (Stage 1, fast mode / Stage 2) DENY "unparseable - blocking for safety"
User abort (signal.aborted) DENY shouldBlock: true, unavailable: true

工具特定分类器编码

Each tool controls what the classifier sees via toAutoClassifierInput(input). Returning '' skips the classifier entirely.

Tool Encoding Example
Bash input.command npm test
PowerShell input.command Get-Process
FileEdit ${file_path}: ${new_string} src/main.py: fixed_code
FileWrite ${file_path}: ${content} src/new.py: new_code
NotebookEdit ${notebook_path} ${mode}: ${source} nb.ipynb replace: cell_code
Agent (subagent_type, mode=X): ${prompt} (worker, mode=default): Fix bug...
WebFetch ${url}: ${prompt} or just url https://api.com: extract data
WebSearch input.query python security best practices
Grep ${pattern} in ${path} or just pattern password in src/
Config ${setting} = ${value} model = opus
CronCreate ${cron}: ${prompt} */5 * * * *: check deploys
MCP tools key1=val1 key2=val2 channel=general message=hello
SendMessage to ${to}: ${message} to worker-1: continue fixing
All others '' (empty) Classifier skipped entirely

完整权限检查流程(自动模式)

1. Deny rules match (tool-level)?                    → DENY
2. Ask rules match (tool-level)?                     → goto step 7
3. Tool's checkPermissions()
   - Returns deny?                                   → DENY
   - Returns requiresUserInteraction?                → ASK
   - Returns ask with content rules?                 → goto step 7
   - Returns safetyCheck (non-classifierApprovable)? → ASK (bypass classifier)
4. bypassPermissions mode?                           → ALLOW
5. Always-allow rules match?                         → ALLOW
6. Passthrough                                       → becomes ASK

7. Mode-specific handling for ASK:
   a. dontAsk mode                                   → DENY
   b. Auto mode:
      i.   PowerShell without feature flag?          → ASK user
      ii.  acceptEdits fast-path passes?             → ALLOW ✓
      iii. Safe tool allowlist?                      → ALLOW ✓
      iv.  YOLO Classifier:
           Stage 1 (fast): <block>no?               → ALLOW ✓
           Stage 1 (fast): <block>yes?              → Stage 2
           Stage 2 (thinking): <block>no?           → ALLOW ✓
           Stage 2 (thinking): <block>yes?          → DENY
           Transcript too long?                      → Fallback to prompting
           API error + iron_gate_closed=true?       → DENY
           API error + iron_gate_closed=false?      → ASK user
           Denial limit exceeded?                    → Fallback to prompting
   c. shouldAvoidPermissionPrompts (headless)?       → auto-DENY

安全设计原则

01
Deny by default
Any failure (API error, parse error, timeout) results in BLOCK. Only explicit <block>no</block> allows.
02
Explicit allow only
The model must affirmatively declare no block. Silence, errors, and ambiguity all result in denial.
03
Assistant text excluded
The model's own text output is stripped from the transcript — only tool calls are included, preventing self-influence.
04
Dangerous rules stripped
Pre-configured allow rules for dangerous commands are removed before entering auto mode and restored on exit.
05
Project settings excluded
autoMode config from .claude/settings.json is ignored — prevents malicious project injection of classifier rules.
06
Fail closed by default
tengu_iron_gate_closed defaults to true. API errors deny by default, not fall back to prompting.
07
Human fallback
After 3 consecutive or 20 total blocks, control returns to human review — prevents denial loops.
08
Scope-limited
Only blocks security-relevant harm — not quality or efficiency issues. An agent looping is never a security threat.
09
Two stages reduce false positives
Stage 2 reviews Stage 1 blocks with full chain-of-thought reasoning before making the final call.
10
Cache-efficient
System prompt, CLAUDE.md, and last action are all cached. Stage 1 → Stage 2 gets a guaranteed cache hit on the expensive prefix.

提示词缓存和模型选择

缓存策略

Block cache_control
System prompt ephemeral
CLAUDE.md ephemeral
Last action block ephemeral

Max 3 cache-control blocks used (API limit is 4). Stage 1 → Stage 2 shares the same prefix, so Stage 2 gets a guaranteed cache hit.

模型选择优先级

1. CLAUDE_CODE_AUTO_MODE_MODEL env var (ANT-only)
2. GrowthBook tengu_auto_mode_config.model
3. Main loop model (getMainLoopModel())

Default: same model as the agent (Sonnet 4.6 or Opus 4.6), ensuring classifier capability matches agent capability.

Debug

CLAUDE_CODE_DUMP_AUTO_MODE (ANT-only): dumps request/response to ~/.claude/tmp/auto-mode/
API errors write diagnostics to ~/.claude/tmp/auto-mode-classifier-errors/
! 分类器增加实际成本
每个未命中快速路径的工具调用都会触发单独的 API 调用,包含完整的对话转录。 第 1 阶段使用 64 个 token 输出,但对于长转录可能需要数千个输入 token。 第 2 阶段最多使用 4,096 个输出 token。尽可能使用 acceptEdits 快速路径或安全工具白名单 以最小化开销 — 提示词缓存显著降低输入 token 成本。