Blog14 May 20265 min read

M2M authentication at 8.5 ns local, 25 ms over public internet

Why we benchmarked a 10 000-token internet round-trip at 25 ms p50

Phase 0 (engine bench on a laptop) and Phase 1 (live two-box HTTPS demo over the public internet) of EdSSA — Ephemeral Decentralised Stateless Structural Authentication. Patent pending in Europe; mechanism details remain confidential, but what we measured doesn't.

EdSSA per-credential verification time benchmark — log-scale bar chart comparing four EdSSA widths against AES-256-GCM, HMAC-SHA-256, RSA-2048 and Ed25519

For machine-to-machine authentication at scale, the bottleneck has always been the same: every request waits on a central authority. OAuth servers, certificate authorities, secrets managers — they pause the request path while they decide whether to accept you. Now layer in the post-quantum migration and AI agents firing millions of requests per second, and the existing model gets harder.

We've been quietly building an alternative: EdSSA — Ephemeral Decentralised Stateless Structural Authentication. Patent pending in Europe (priority filed 2026-05-01). This week we got it running end-to-end. Here's what we observed.

Phase 0 — engine benchmark, on a laptop

We measured the verification core on an Apple MacBook Neo (Chip: Apple A18 Pro) laptop with Criterion. Each result includes a lock-free atomic state read — the full hot path, not a contrived best case. (See the chart above.)

Credential width	Strength bracket	Verify time	Throughput per core
32 B	256 bits (AES-256 family)	8.48 ns	117 M tokens / s
64 B	512 bits (SHA-512 family)	22 ns	46 M tokens / s
256 B	2 048 bits (RSA-2048 strength)	112 ns	2.13 GiB / s
2 048 B	16 384 bits	882 ns	2.16 GiB / s

The reference points matter:

AES-256-GCM tag verification on the same hardware sits around 10 ns per small block — we're in the same league at 256-bit strength.
RSA-2048 signature verification sits at roughly 30 µs. We verify a 2048-bit-strength credential in 0.112 µs. That's a ~270× speed-up at the same cryptographic strength bracket, without doing any per-request asymmetric crypto. Engine-only. With all production controls active (rate-limit, XFF chain resolution, region routing, per-tenant metering emit), the same verifier-side compute path lands at ~332 ns — a ~90× speed-up at the same strength bracket with the full wrapper stack on, sustaining ~3 million authenticated requests per second per core. See the "Update — figures with production controls in the loop" section below for the per-component breakdown.
Linear scaling out to 16 384 bits — same engine, no architectural change, ~0.43 ns per byte. The const-generic foundation absorbs payload size without dynamic dispatch. 16 384 bits is alien-level cryptography.

Phase 1 — live, two boxes, public internet

Then we shipped it. A laptop on coffee-shop wifi authenticated 10 000 consecutive HTTPS requests against a small EU-hosted VPS in Helsinki. Caddy fronted the box with a Let's Encrypt cert (DNS-01 wildcard). A Rust sidecar verified each credential and forwarded to a trivial upstream.

Metric	Result
Total requests	10 000
Accepts	10 000 (100 %)
Rejects on the happy path	0
Client p50 (public internet RTT-dominated)	25.30 ms
Client p95 / p99	35.39 / 97.26 ms
Server-side mean (verify + forward, full HTTP stack)	225 µs

The 25 ms p50 is network round-trip; the 225 µs server-side mean is dominated by Axum + reqwest plumbing. The actual cryptographic verification cost is invisible against either.

What "patent pending" covers (without saying too much)

The European application is filed; the 12-month priority window stays open until May 2027. The umbrella term is Structural Authentication: a paradigm in which two parties independently arrive at identical ephemeral credentials at request time, without round-tripping a centralised authority and without consensus among multiple nodes. The architecture is post-quantum (using ML-KEM / NIST FIPS 203 for the initial key agreement) and entirely volatile-memory-state — nothing per-credential touches disk.

What's safe to share publicly today:

It works at nanosecond timescales for the verifier hot path.
It runs on commodity hardware. No specialised silicon required, no co-processor, no GPU.
The cost surface is operator-grade, not enterprise-license-grade.
Mechanism details — claim 1, the construction rules, the schema model — remain confidential until the priority window closes and the application publishes.

What's next

The build queue past Phase 1 lines up roughly as: horizontal-scaling variants → integration with public synchronisation sources → hardened replay defence → regulator-grade audit emission → a live control panel. All targeting a Community Edition release in 2026 with a small-business-friendly source-available license.

Strategically, the asks are simple:

We're talking to a small set of Charter Customers ahead of public release. The target verticals are heavy on M2M traffic — high-frequency trading, agent-to-agent AI inference, IoT and edge fleets, regulated logistics, telco/automotive/aviation control planes. Generous founder-priced contracts for the first ten enterprises that ship to production with us.
If you're a researcher, engineer, or technically minded operator at one of those verticals and want to compare notes (NDA-gated, no pitch needed), founder direct line is tw@edssa.io.

Update — figures with production controls in the loop

26 May 2026.

The original Phase-0 numbers above are engine-only: verify_token plus an ArcSwap atomic state read, with nothing else on the request path. That's the right figure for "how fast can the cryptographic core actually go," and it stands at 8.63 ns at the 256-bit bracket on the bench machine today. Run-to-run variance vs the 8.48 ns we published in May is sub-1 ns — same number.

Since publication, Phase 9 added per-tenant rate limiting, X-Forwarded-For chain resolution, and region-routing substrate around the verify call. Phase 10 added per-tenant metering emit on the audit path. Those wrappers weren't measured at the time the post went live. They are now.

Steady-state happy path — hot tenant, Cloudflare 1-hop, home region, 2048-bit credential:

Component (Apple A18 Pro, Criterion median)	Cost
`verify_token (full hot path) — Enterprise` (N=256, 2048-bit)	111.69 ns
`ratelimit (hot key, allowed)`	40.57 ns
`xff_resolver (1-hop trusted)`	53.12 ns
`region_routing::contains (hit, home)`	2.63 ns
`metering_emit` (existing tenant, audit path)	123.80 ns
Total verifier-side compute, full production controls	~332 ns

Against RSA-2048 signature verification (~30 µs / 30 000 ns on the same hardware), that lands at ~90× speed-up at the same cryptographic strength bracket — with rate-limit, multi-tenant accounting, region pinning, and the audit emitter all active. The original post quoted ~270× against the bare engine; the controlled figure is lower because the comparison is now denominator-honest. The structural property holds: the verifier-side compute path is sub-microsecond with full production wrappers on, and per-core throughput is ~3.0 million authenticated requests per second.

The two-box public-internet figures from May 2026 are unchanged — a 25.30 ms client-side p50 and a ~225 µs server-side mean are both RTT- and HTTP-stack-dominated. Adding ~220 ns of new wrapping cost to a 225 µs HTTP mean is a 0.1 % shift — below the measurement noise floor.

Other deployment profiles

The headline row is "hot tenant, Cloudflare 1-hop, home region." Two profiles worth quoting separately:

Profile	Total compute	Ratio vs RSA-2048
Boot-default unlimited fleet (low-pressure deployments)¹	~310 ns	~97×
GCP/AWS 3-hop NLB chain (worst-typical XFF depth)²	~409 ns	~73×

¹ ratelimit (unlimited short-circuit) 18.72 ns instead of 40.57 ns. Most fleets that haven't enabled per-tenant burst limits hit this path. ² xff_resolver (3-hop trusted) 130.16 ns instead of 53.12 ns. NLB → LB → sidecar chains add cost; chain length is parsed rightmost-first against trusted CIDRs.

What dominates

The single biggest line is metering emit (123.80 ns) — per-tenant accounting for the pay-as-you-go billing path. It's bigger than the cryptographic verification at the 2048-bit bracket. That's not a regression; it's the honest cost of multi-tenant operability. Region routing is essentially free (2.63 ns home / 1.18 ns miss). The ratelimit hot path costs ~41 ns when a fleet has burst limits configured, ~19 ns when it doesn't.

Caveats

Single-thread microbench medians on a quiescent laptop. Production cost under contention will be higher; cold-key insertion (ratelimit (cold-key insert) 135.79 ns) amortises across many subsequent hot calls so it doesn't appear in the steady-state total. The number we publish is "steady-state happy path with a hot tenant and one Cloudflare hop" — the deployment shape we expect to be most common.

Bench harness: edssa-proxy/benches/phase9_hotpath.rs, edssa-audit/benches/phase10_metering.rs, edssa-core/benches/edssa_benchmark.rs. Criterion runs captured 2026-05-26.

A small note on the work itself

This was a real sprint moving across days and time zones with a tight feedback loop. The bench numbers above ran on real hardware; the live demo handles real internet traffic. The thesis we set out to test was that high-end M2M authentication doesn't need a multinational SaaS subscription, doesn't need a quantum-vulnerable bearer-token in transit, and doesn't need to fail when the central authority is unreachable. Two phases in, the thesis is holding.

— Tomas Westerholm · tw@edssa.io · EdSSA

European patent application filed at the Finnish Patent and Registration Office (PRH) on 2026-05-01 under the European patent route. PCT (Patent Cooperation Treaty) continuations and US filings targeted for 2027 Q1, within the 12-month priority window that closes 2027-05-01. A1 publication scheduled for ~2027-11.