Methodology

TrustBench is a public registry of x402-style endpoints with nightly liveness telemetry and signed scorecards. This page documents exactly how the data is collected, how scores are computed, and what each metric represents — so anyone integrating against the registry knows what they're working with.

Data collection

A scheduled job runs once per day on a single cloud host.
For each provider URL, the prober sends three sequential requests per run. The three samples are tagged us-east / eu-west / asia-southeast for variance accounting; they all originate from the same host today. Multi-host probing is on the roadmap.
Each request is a HEAD with an 8-second timeout, falling back to GET if the server returns 405.
HTTP status codes 200, 201, 204, 401, 402, 403, 404, 405, 429 are recorded as "endpoint is alive." Other statuses, connection errors, and timeouts are recorded as failures.

Scoring

score = 15
      + 45 · successRate
      + 35 · latencyHealth        // max(0, min(1, 1 - p50 / 2000))
      +  3 · consistencyBonus     // max(0, min(1, 1 - jitter))
clamped to [40, 98]

p50 and p95 latency are computed over successful probes only, using linear-interpolation percentiles. Timeouts contribute to reliability but are excluded from the latency calculation, so a single failure does not distort the latency number.

What each metric represents

Score reflects reachability and response time, not capability quality. A 4xx or 429 response confirms the endpoint is up and responding, but does not confirm the underlying API behaves correctly when authenticated and paid.
Latency is single-origin. All measurements come from one host today, so real-world latency from an agent's location will differ. Multi-host measurement is planned.
Payment behavior is not yet measured. The current probe does not execute x402 payments, observe settlement latency, or validate payment-gated responses. A capability-aware paid-probe layer ships alongside the router.
Scorecards are signed with Ed25519. The public key is served at /.well-known/trustbench-pubkey so any third party can verify a TrustBench scorecard independently. See "Verifying a scorecard" below.

Verifying a scorecard

Each entry returned by /rankings/paid includes signed_payload, signature, and signature_alg (ed25519 when the deployment has a published public key, hmac-sha256 as a fallback). The Ed25519 public key is served at /.well-known/trustbench-pubkey and can be used by anyone to verify a scorecard without contacting TrustBench:

// Reference verifier (Node) — also in scripts/verify-scorecard.js
const pubPem = await (await fetch(BASE + '/.well-known/trustbench-pubkey')).text();
const publicKey = crypto.createPublicKey({ key: pubPem, format: 'pem' });

const valid = crypto.verify(
  null,
  Buffer.from(sc.signed_payload),
  publicKey,
  Buffer.from(sc.signature, 'base64')
);

Roadmap

TrustBench is evolving from a public registry into a non-custodial smart router and payment-plumbing layer for agent commerce. The registry will continue to publish liveness telemetry; the next milestones are a capability-aware paid-probe layer and a non-custodial /route endpoint that constructs x402 transactions for agents to sign and returns a signed receipt. Full plan in TrustBench-strategy.md.

Analytics dashboard · Sample rankings · Health