polygraph.so
Login

Catch the tools your agent can’t trust before they touch your data.

We polygraph
AI tools so you
don’t have to.

AI agents plug into third-party tools and load skills that can hijack them or leak your data. We run those tools through an adversarial test, scan those skills for the same tricks, and publish a letter grade — free to read, public, evidence attached.

polygraph grade: AWe polygraph ourselves, too — see the report.
latest polygraphlitmus-v12
D

npm/@wildcard-ai/deepcontext

tool-output injection
pass
permission overreach
pass
sensitive-data handling
pass

See the full run

01

We test it. Adversarial probes in a sandbox — does it hijack the agent, overreach, or leak?

02

We grade it. A letter grade, A to F, published free with the evidence attached.

03

You check it. One command before your agent installs anything.

baselineadversarial probesbaseline

MCP servers and Agent Skills grew faster than anyone’s ability to polygraph them.

Agents now install and run skills off marketplaces unvetted. You need evidence anyone can check.

Fresh from the harness.

The latest checks we’ve published, newest first — each one a real litmus run you can reproduce. Browse every grade, MCP servers and skills, in the index.
DMCP server
litmus-v12

npm/@wildcard-ai/deepcontext

Adversarial input handling failed (C-04): the server crashed, leaked internals (a stack trace), or amplified hostile input. No injection or data leak, so the grade caps at D.

View report →
AMCP server
litmus-v10

pypi/mcp-server-time

All four categories passed. No injection, no data leak, no egress overreach, and adversarial inputs were handled cleanly (A means no overreach, not no network).

View report →
AMCP server
litmus-v12

pypi/obris-mcp

All four categories passed. No injection, no data leak, no egress overreach, and adversarial inputs were handled cleanly (A means no overreach, not no network).

View report →
AMCP server
litmus-v12

npm/@meetlark/mcp-server

All four categories passed. No injection, no data leak, no egress overreach, and adversarial inputs were handled cleanly (A means no overreach, not no network).

View report →
AMCP server
litmus-v12

npm/@autonomad1/computeback-mcp

All four categories passed. No injection, no data leak, no egress overreach, and adversarial inputs were handled cleanly (A means no overreach, not no network).

View report →
AMCP server
litmus-v12

pypi/nudg3-mcp

All four categories passed. No injection, no data leak, no egress overreach, and adversarial inputs were handled cleanly (A means no overreach, not no network).

View report →
AMCP server
litmus-v12

npm/bitrefill-mcp-server

All four categories passed. No injection, no data leak, no egress overreach, and adversarial inputs were handled cleanly (A means no overreach, not no network).

View report →
AMCP server
litmus-v12

npm/@proofslip/mcp-server

All four categories passed. No injection, no data leak, no egress overreach, and adversarial inputs were handled cleanly (A means no overreach, not no network).

View report →
AMCP server
litmus-v12

npm/@polygraphso/litmus

All four categories passed. No injection, no data leak, no egress overreach, and adversarial inputs were handled cleanly (A means no overreach, not no network).

View report →
AMCP server
litmus-v12

npm/@polygraphso/litmus

All four categories passed. No injection, no data leak, no egress overreach, and adversarial inputs were handled cleanly (A means no overreach, not no network).

View report →

How a tool earns its grade.

Nine probes across four live checks — and one sandbox that captures every outbound call. A check we can’t run is reported as skipped — never passed — and every grade ships with the evidence: not a star rating, the actual artifacts.
  1. C-01litmus-v11 · live

    Does it try to hijack your agent?

    tool-output injection · probe 1.1 · probe 1.2 · probe 1.3

    We bait it with inputs designed to make it slip commands into its output — including one tool's output fed into another — then scan for hijack attempts: lookalike instructions, hidden text, markdown tricks.
  2. C-02litmus-v11 · live

    Does it touch things it shouldn't?

    permission overreach · probe 2.1 · probe 2.2

    We run local tools in a sandbox that captures every outbound call, then flag any that reach beyond the hosts and ports the server declared it needs. Remote servers can't be sandboxed — there this check is marked skipped, never assumed. We also flag a tool that labels itself read-only while its name, a parameter, or its description shows it mutates — a permission lie your agent would otherwise trust.
  3. C-03litmus-v11 · live

    Does it leak your data?

    sensitive-data handling · probe 4.1 · probe 4.2

    We plant fake secrets — keys, personal details — and watch every path out of the sandbox to see if they leave, including the tool's own replies to the agent.
  4. C-04litmus-v11 · live

    How does it handle hostile input?

    adversarial input handling · probe 3.1 · probe 3.2

    We hit each tool with malformed and oversized inputs and known jailbreak patterns, and flag it if it crashes, spills an internal stack trace, or turns the hostile input into an attack of its own.

Grades run A–F — capped at B when egress can’t be verified, down to D for overreach or a crash, F for a hijack or leak. Each grade is pinned to a sha256 fingerprint of the tool surface, so a later change — a rug pull — makes it stale automatically. Read the full grade rubric and methodology.

Run polygraph in your agent — or grade a server from your terminal.

The open litmus harness grades a server A–F with reproducible, content-addressed evidence. Add it to your agent, run it yourself, or gate your CI on it.
Add to Cursor

One click — installs the MCP server (run_litmus, verify_attestation). Prefer to edit ~/.cursor/mcp.json? Use the config below.

$ npx -y -p @polygraphso/litmus polygraphso-litmus litmus <mcp-server>

Grade a server yourself. Or wire the MCP server into any client with npx -y -p @polygraphso/litmus polygraphso-litmus-mcp.

/plugin install polygraph@polygraphso

Claude Code, after /plugin marketplace add polygraphso/litmus. Claude Desktop: paste the config below into claude_desktop_config.json.

Fail a build when an MCP server — or a skill it ships — grades D/F. On the GitHub Marketplace as polygraphso/litmus@v1:

# .github/workflows/mcp-gate.yml
name: mcp-gate
on: [pull_request]
permissions:
  contents: read
jobs:
  gate:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v5
      - uses: polygraphso/litmus@v1

Same config everywhere — paste into ~/.cursor/mcp.json (Cursor), claude_desktop_config.json (Claude Desktop), or your client’s MCP config:

{
  "mcpServers": {
    "polygraph-litmus": {
      "command": "npx",
      "args": ["-y", "-p", "@polygraphso/litmus", "polygraphso-litmus-mcp"],
      "env": { "POLYGRAPH_API_URL": "https://polygraph.so" }
    }
  }
}

Show your grade where developers look.

Maintain an MCP server we’ve graded? Put its live polygraph on your README, npm page, or docs. The badge reads the current grade — it updates itself — and links back to the reproducible report.
Example polygraph grade card for an MCP server

A fuller card for a README header or a docs page.

Example polygraph grade badgesits in a README badge row
[![polygraph](https://polygraph.so/api/badge?server=npm/@modelcontextprotocol/server-filesystem)](https://polygraph.so/mcp/npm/@modelcontextprotocol/server-filesystem)

Swap in your own registry/owner/name ref. Full embed guide →

See a live report →

Where this is going.

Sequenced, not dated — we publish when a step survives review, not when a calendar says so.
  1. now

    litmus-v11 harness

    Built and running: nine probes across four categories, a grade from A to F, evidence attached.
  2. next

    More public grades

    Next on the bench: filesystem, github, slack, puppeteer, git. Vendors hear about significant failures before the public does.
  3. later

    Verifiable proof

    Grades published as timestamped records anyone can check without trusting us.
  4. later

    Verified runs

    Hardware-attested runs, so a third party can prove a grade is real.

How polygraph gets funded.

Free to read, and not paid for by anyone we grade. Here’s where the money comes from instead.
$POLYGRAPH

The Bankr community launched $POLYGRAPH — we didn’t issue it. We claim the dev fees publicly and use them to fund the work: the harness, the grades, and the evidence stay free to read.

Nobody can pay for a grade. No graded party gets review or approval rights over their result.

View $POLYGRAPH on Bankr →

Not financial advice. The token funds the work; it doesn’t move a grade.

Follow new polygraphs as they publish.

A short email when we publish new polygraphs — no per-server tracking, no drip campaign, no “hey just checking in.”

Waiting on a specific server? Get an email when its grade publishes →