Methodology

TrustBench is a public registry of x402-style endpoints with nightly liveness telemetry and signed scorecards. This page documents exactly how the data is collected, how scores are computed, and what each metric represents — so anyone integrating against the registry knows what they're working with.

Data collection

Scoring

score = 15
      + 45 · successRate
      + 35 · latencyHealth        // max(0, min(1, 1 - p50 / 2000))
      +  3 · consistencyBonus     // max(0, min(1, 1 - jitter))
clamped to [40, 98]

p50 and p95 latency are computed over successful probes only, using linear-interpolation percentiles. Timeouts contribute to reliability but are excluded from the latency calculation, so a single failure does not distort the latency number.

What each metric represents

Verifying a scorecard

Each entry returned by /rankings/paid includes signed_payload, signature, and signature_alg (ed25519 when the deployment has a published public key, hmac-sha256 as a fallback). The Ed25519 public key is served at /.well-known/trustbench-pubkey and can be used by anyone to verify a scorecard without contacting TrustBench:

// Reference verifier (Node) — also in scripts/verify-scorecard.js
const pubPem = await (await fetch(BASE + '/.well-known/trustbench-pubkey')).text();
const publicKey = crypto.createPublicKey({ key: pubPem, format: 'pem' });

const valid = crypto.verify(
  null,
  Buffer.from(sc.signed_payload),
  publicKey,
  Buffer.from(sc.signature, 'base64')
);

Roadmap

TrustBench is evolving from a public registry into a non-custodial smart router and payment-plumbing layer for agent commerce. The registry will continue to publish liveness telemetry; the next milestones are a capability-aware paid-probe layer and a non-custodial /route endpoint that constructs x402 transactions for agents to sign and returns a signed receipt. Full plan in TrustBench-strategy.md.

Analytics dashboard · Sample rankings · Health