Security Guidance plugin for Claude Code

What it proposes

An official Claude Code plugin that makes the agent review its own code changes for common vulnerabilities while it works, and fix them in the same session before they reach a PR. It is built entirely on hooks and runs at three escalating depths. First, on each file edit, a PostToolUse hook pattern-matches new content against known risky constructs (eval, new Function, os.system, child_process.exec, pickle, dangerouslySetInnerHTML, .innerHTML=, document.write, workflow-file paths). This is a pure string/regex match: no model call, no usage cost, fires once per pattern per file per session, extensible via .claude/security-patterns.yaml (up to 50 rules). Second, at the end of each turn, a Stop hook computes a git diff of everything that changed (edits, Bash, subagents) and sends it to a separate background Claude review focused on security, catching things a string match cannot: authz bypass, IDOR, SSRF, weak crypto, injection. It covers up to 30 changed files and fires at most three times in a row. Third, on each commit or push Claude itself makes through its Bash tool, a deeper agentic background review reads surrounding code (callers, sanitizers, related files) to judge whether a finding is real, capped at 20 per rolling hour. The key design choice is review independence: the reviews run as separate Claude calls with fresh context, so the instance that wrote the code is not grading itself. No layer blocks writes or commits; findings reach the writing Claude as instructions. The model-backed layers use Opus 4.7 by default (overridable), and every layer can be disabled individually via env vars. The docs are explicit that this is one layer of defense in depth, not a complete security solution.

Best used when

You run an AI coding agent heavily across many repos and want a cheap, always-on first pass that catches the obvious unsafe patterns before they ever reach review. It fits work that touches genuinely risky surface area: shell-exec and subprocess calls, deserialization, DOM-injection sinks in JS/TS front ends, server-side request construction, and CI workflow files. The free per-edit pattern layer alone is worth enabling for anyone, since it has no usage cost and catches the canonical footguns at the moment they are written. The model-backed layers pay off most when you commit through the agent rather than your own shell, and when the cost of a missed vulnerability in a public repo is high relative to the modest per-turn/per-commit usage spend. It suits a solo operator well: installation writes to user settings and loads in every new local session with nothing to invoke, and for shared or public repos the same config can be checked into .claude/settings.json so collaborators inherit it.

Poor fit when

The two valuable model-backed layers only fire on commits and pushes the agent makes through its own Bash tool, so if you habitually commit yourself, via the ! shell escape, or through a separate Git client, the commit-review layer never runs and you are left with the free pattern match plus the end-of-turn diff review. It is also explicitly non-blocking: findings are instructions to the writing Claude, not gates, so the same review model that flags an issue can also miss it or be talked past, which means it cannot be relied on as the enforcement boundary for anything that actually matters. If you need hard enforcement you must still pair it with a blocking hook or a CI check. The end-of-turn and commit reviews consume model usage like any request (the commit review is agentic and spends several turns), so on usage-metered plans a high-churn editing session adds real cost. For OAuth or subscription users there is no API key to configure: Claude Code passes the plan’s auth token (ANTHROPIC_AUTH_TOKEN) automatically, so the model layers begin spending plan credits on first run with zero setup, which also means the session’s code diff is sent to the Anthropic API on every turn that changed code. The per-layer disable env vars mitigate the cost but require deliberate tuning. Finally, the YAML pattern-extension path needs PyYAML importable (JSON works on any Python), and the whole plugin requires Python 3.8+ on PATH. On the first session, if claude_agent_sdk is not already importable, a SessionStart bootstrap builds an isolated venv at ~/.claude/security/agent-sdk-venv and pip-installs claude-agent-sdk into it (a no-op when the SDK is already present), so the model-backed layers carry a one-time install step and a PyPI fetch on a fresh machine.

Verdict

Adopt, with the understanding that its real role is volume reduction, not enforcement. The free per-edit pattern layer is a no-brainer for any repo: zero cost, no model call, catches the canonical injection/deserialization/DOM footguns at the moment they are written, and is the kind of always-on hygiene that complements existing lint and git discipline. The model-backed end-of-turn and commit layers add genuine value for security-sensitive surfaces (shell exec, deserialization, request construction, public-repo workflow files) and their cost is bounded by caps and fully disableable per layer, so usage exposure is controllable. The two caveats that keep this short of an unconditional endorsement are both stated plainly in the docs: the commit-review layer only fires on commits the agent makes itself, so anyone who commits manually loses that layer; and nothing here blocks, so it must sit inside a defense-in-depth stack (its own framing: in-session, then on-demand /security-review, then PR-time Code Review, then CI scanners) rather than replacing static analysis, dependency scanning, or a blocking pre-commit hook for the things that genuinely cannot ship broken. Treated as a cheap, official, locally-running early filter that reduces what reaches the slower stages, it earns its place; treated as a security guarantee, it would be a liability.