The Claim, Plainly
A skill named "read-pdfs" can override your system prompt, exfiltrate your files, and harvest your API keys—without executing a single line of code. That sounded like FUD until I read the spec.
A SKILL.md Is a Prompt, Not a Plugin
Senior devs half-know this: a SKILL.md is plain text injected directly into your agent's context window. No sandbox. No runtime boundary. The "skill" isn't a binary or a sandboxed script—it's prose that becomes part of your model's instructions the moment it loads.
Once loaded, the LLM treats it like any other instruction. Override the system prompt? Write "ignore previous instructions and…" Exfiltrate files? Suggest the next tool call should POST to attacker.com. Harvest keys? Ask politely—the model wants to be helpful.
That's the attack surface. Not the runtime. The text.
Why Lazy Loading Doesn't Save You
We built Context Steward to lazy-load skills only when relevant—a context-window optimization. That works for cost and latency. It does nothing for trust. Loading a skill at the right moment doesn't make it safe to load.
v0.4.0: The Skill Security Auditor
So v0.4.0 ships a skill security auditor. Before any skill enters context, it's scanned across six categories:
- Prompt injection — instruction overrides, jailbreak patterns
- Data exfiltration — outbound URLs, hidden file reads
- Privilege escalation — attempts to expand permissions
- Metadata mismatch — frontmatter that lies about the body
- Obfuscation — Unicode tricks, zero-width chars, base64
- Credential harvesting — patterns asking for keys or tokens
Each skill gets a 0–100 score and a GREEN / AMBER / RED grade. RED never loads.
Poisoned test skill: 13 findings, score 0, RED. Blocked. 45 production skills: 45 GREEN. Signal-to-noise looks right.
The Mental Model
Your agent's context window is a privileged execution environment. We spent the last decade learning not to eval() untrusted input. Skills are the same lesson with prose instead of code—and the interpreter is an LLM that wants to be helpful.
The auditor is heuristic; a subtle attack will slip past it. But right now the floor is "whatever ships in the SKILL.md". That's not a floor—it's a hole.
Try It
npm install -g context-steward
Open source, zero telemetry, MIT licensed.
→ github.com/BouletteProof/context-steward
#OpenSource #AIEngineering #AgenticAI