polygraph.so

Catch the agents you can’t trust before they touch your data.

We polygraph
AI agents so you
don’t have to.

Independent, lab-evaluated polygraphs for MCP servers and the agents that use them. Free public polygraphs. CLI for runtime checks.

install · cliv1 · npm
$ npx polygraph check <mcp-server>
  1. 01fetches the polygraph from polygraph.so
  2. 02returns polygraph: grade + last-tested date
  3. 03links to the full evidence report

Not yet polygraphed? The CLI returns queued, position #Nand notifies you when the polygraph lands. The CLI is a lookup — probes run in our lab, not on your machine.

baselineadversarial probesbaseline

The MCP ecosystem grew faster than anyone’s ability to polygraph it.

Adoption metrics and dependency scans don’t tell you whether a server will exfiltrate your data or hijack the agent calling it. Frontier labs won’t independently polygraph the ecosystem they’re building on. You need an outside opinion.

5 probes. 3 categories. One sandbox.

Every probe runs in an isolated sandbox. Every result is reproducible and published with the evidence — not a star rating, the actual artifacts.
  1. C-01shipping v1

    Tool-output injection

    probe 1.1 · probe 1.2

    Does the server's output try to hijack the agent calling it? We feed it inputs that bait it into emitting injection-shaped text, then scan outputs for instruction mimicry, hidden unicode, and markdown tricks.
  2. C-02shipping v1

    Permission overreach

    probe 2.2

    Does it touch more than it claimed? In a no-expected-egress run, we flag any outbound network call. Phone-home detection on a default-deny network namespace.
  3. C-03shipping v1

    Sensitive data handling

    probe 4.1 · probe 4.2

    Does your data leave the sandbox when it shouldn't? We plant trackable markers (fake keys, distinctive PII strings) and watch every egress path plus the tool's own outputs back to the agent.
  4. C-04v2 · deferred

    Adversarial input handling

    How does it behave on malformed inputs, oversized payloads, and known jailbreak patterns? Deferred from v1 — the deterministic battery ships first; this category waits for the harness to mature.

Probes evolve as agents do — new failure modes get new probes. The methodology is versioned and public. Read the v1 spec.

Three orthogonal axes. Never averaged.

“Popular but dangerous” is a specific, valuable signal. We keep the axes apart so the signal stays sharp.
Table 1 — Trust framework, by question
#QuestionStatus
01

Is this artifact well-made?

Public registries · OpenSSF · GitHub

existing
02

Does it behave well under pressure?

Our sandbox

v1 polygraph — shipping
03

Does it stay behaving well in production?

Runtime telemetry

next

We run axis 02 — the polygraph. We point at axes 01 and 03 — never average them in.

Get notified when polygraphs publish.

One email when polygraphs for the servers you care about land. No drip campaign, no “hey just checking in.”