Catch the tools your agent can’t trust before they touch your data.
We polygraph
AI tools so you
don’t have to.
AI agents plug into third-party tools and load skills that can hijack them or leak your data. We run those tools through an adversarial test, scan those skills for the same tricks, and publish a letter grade — free to read, public, evidence attached.
npm/@wildcard-ai/deepcontext
- tool-output injection
- pass
- permission overreach
- pass
- sensitive-data handling
- pass
We test it. Adversarial probes in a sandbox — does it hijack the agent, overreach, or leak?
We grade it. A letter grade, A to F, published free with the evidence attached.
You check it. One command before your agent installs anything.
MCP servers and Agent Skills grew faster than anyone’s ability to polygraph them.
Agents now install and run skills off marketplaces unvetted. You need evidence anyone can check.
Fresh from the harness.
How a tool earns its grade.
- C-01litmus-v11 · live
Does it try to hijack your agent?
tool-output injection · probe 1.1 · probe 1.2 · probe 1.3
We bait it with inputs designed to make it slip commands into its output — including one tool's output fed into another — then scan for hijack attempts: lookalike instructions, hidden text, markdown tricks. - C-02litmus-v11 · live
Does it touch things it shouldn't?
permission overreach · probe 2.1 · probe 2.2
We run local tools in a sandbox that captures every outbound call, then flag any that reach beyond the hosts and ports the server declared it needs. Remote servers can't be sandboxed — there this check is marked skipped, never assumed. We also flag a tool that labels itself read-only while its name, a parameter, or its description shows it mutates — a permission lie your agent would otherwise trust. - C-03litmus-v11 · live
Does it leak your data?
sensitive-data handling · probe 4.1 · probe 4.2
We plant fake secrets — keys, personal details — and watch every path out of the sandbox to see if they leave, including the tool's own replies to the agent. - C-04litmus-v11 · live
How does it handle hostile input?
adversarial input handling · probe 3.1 · probe 3.2
We hit each tool with malformed and oversized inputs and known jailbreak patterns, and flag it if it crashes, spills an internal stack trace, or turns the hostile input into an attack of its own.
Grades run A–F — capped at B when egress can’t be verified, down to D for overreach or a crash, F for a hijack or leak. Each grade is pinned to a sha256 fingerprint of the tool surface, so a later change — a rug pull — makes it stale automatically. Read the full grade rubric and methodology.
Run polygraph in your agent — or grade a server from your terminal.
Cursor
Add to CursorOne click — installs the MCP server (run_litmus, verify_attestation). Prefer to edit ~/.cursor/mcp.json? Use the config below.
Terminal
$ npx -y -p @polygraphso/litmus polygraphso-litmus litmus <mcp-server>Grade a server yourself. Or wire the MCP server into any client with npx -y -p @polygraphso/litmus polygraphso-litmus-mcp.
Claude
/plugin install polygraph@polygraphso
Claude Code, after /plugin marketplace add polygraphso/litmus. Claude Desktop: paste the config below into claude_desktop_config.json.
Gate your CI — GitHub Action
Fail a build when an MCP server — or a skill it ships — grades D/F. On the GitHub Marketplace as polygraphso/litmus@v1:
# .github/workflows/mcp-gate.yml
name: mcp-gate
on: [pull_request]
permissions:
contents: read
jobs:
gate:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v5
- uses: polygraphso/litmus@v1Manual setup — any MCP client
Same config everywhere — paste into ~/.cursor/mcp.json (Cursor), claude_desktop_config.json (Claude Desktop), or your client’s MCP config:
{
"mcpServers": {
"polygraph-litmus": {
"command": "npx",
"args": ["-y", "-p", "@polygraphso/litmus", "polygraphso-litmus-mcp"],
"env": { "POLYGRAPH_API_URL": "https://polygraph.so" }
}
}
}Show your grade where developers look.
The card
A fuller card for a README header or a docs page.
The inline badge
[](https://polygraph.so/mcp/npm/@modelcontextprotocol/server-filesystem)
Swap in your own registry/owner/name ref. Full embed guide →
Where this is going.
- now
litmus-v11 harness
Built and running: nine probes across four categories, a grade from A to F, evidence attached. - next
More public grades
Next on the bench: filesystem, github, slack, puppeteer, git. Vendors hear about significant failures before the public does. - later
Verifiable proof
Grades published as timestamped records anyone can check without trusting us. - later
Verified runs
Hardware-attested runs, so a third party can prove a grade is real.
How polygraph gets funded.
The Bankr community launched $POLYGRAPH — we didn’t issue it. We claim the dev fees publicly and use them to fund the work: the harness, the grades, and the evidence stay free to read.
Nobody can pay for a grade. No graded party gets review or approval rights over their result.
Not financial advice. The token funds the work; it doesn’t move a grade.
Follow new polygraphs as they publish.
A short email when we publish new polygraphs — no per-server tracking, no drip campaign, no “hey just checking in.”
Waiting on a specific server? Get an email when its grade publishes →