Benchmarks

agentkernel runs on five different backends across Linux and macOS. We benchmark all of them so you know exactly what to expect.

All numbers below are measured on real hardware -- an AMD EPYC server for Linux backends and an M3 Pro MacBook for macOS backends. No synthetic microbenchmarks. Every number represents the full end-to-end latency of agentkernel run -- echo hello, from command invocation to output.

The headline numbers

Backend	Latency	Throughput	Isolation
Hyperlight pool (Linux)	<1µs	~3,300 RPS	Hypervisor + Wasm
Firecracker daemon (Linux)	195ms	~5.1/sec	Full VM (separate kernel)
Docker (macOS)	~220ms	~4.5/sec	Container (shared kernel)
Docker pool (Linux)	~250ms	~4.0/sec	Container (shared kernel)
Podman (macOS)	~300ms	~3.3/sec	Container (rootless)
Podman (Linux)	~310ms	~3.2/sec	Container (rootless)
Kubernetes (remote)	~570ms	~7.7/sec	Pod (NetworkPolicy)
Nomad (remote)	~570ms	~6.1/sec	Job allocation
Firecracker cold (Linux)	800ms	~1.3/sec	Full VM (separate kernel)
Apple Containers (macOS 26+)	~940ms	~1.1/sec	Full VM (separate kernel)

Pre-warmed pools make the fastest backends feel instant. Cold starts are still faster than most container runtimes.

Where the time goes

Every sandbox execution has phases: boot the isolation boundary, wait for the environment to be ready, execute the command, then tear down. Here's how each backend breaks down:

Backend	Boot	Ready	Exec	Shutdown
Hyperlight pool	0ms	<1µs	<1ms	N/A
Firecracker daemon	0ms	0ms	19ms	0ms
Firecracker cold	78ms	110ms	19ms	20ms
Apple Containers	860ms	860ms	95ms	37ms

Docker and Podman use a single run --rm operation internally, so their breakdown is a single combined step rather than separate phases.

The daemon and pool backends eliminate boot and shutdown by reusing pre-warmed instances. You pay the startup cost once, then every subsequent execution skips straight to the fast part.

Firecracker vs Docker

The comparison that matters most -- VM isolation vs container isolation on the same Linux hardware:

Metric	Docker	Firecracker	Winner
Process start	40ms	46ms	Tie
Instance ready	155ms	110ms	Firecracker
Command execution	53ms	19ms	Firecracker (vsock)
Shutdown	130ms	20ms	Firecracker (6.5x)
Memory per instance	~50-100MB	<10MB	Firecracker (5-10x)
Isolation	Shared kernel	Separate kernel	Firecracker

Firecracker uses vsock for command execution -- a direct host-to-VM communication channel that's 3x faster than Docker's exec path. Shutdown is 6.5x faster because there's no container runtime overhead.

And Firecracker's boot time was optimized from 961ms down to 110ms -- an 89% reduction -- by disabling unnecessary kernel drivers:

Optimization	Time saved
Disable PS/2 keyboard driver (`i8042.nokbd`)	~500ms
Skip PS/2 aux port probe (`i8042.noaux`)	~260ms
Quiet boot (`quiet loglevel=4`)	~90ms

Hyperlight: sub-microsecond execution

Hyperlight is the experimental backend that pushes the boundaries of what's possible. It uses Microsoft's hypervisor-isolated micro VMs to run WebAssembly modules with dual-layer security: a Wasm sandbox inside a hypervisor boundary.

The key number: warm pool acquire takes 0.2µs. That's 50,000x faster than a cold Hyperlight startup (68ms) and over 1,000,000x faster than a Firecracker cold boot (800ms).

Metric	Value
Cold startup	68ms (avg), 67ms (p50)
Warm acquire	0.2µs (avg), <1µs (p50)
Function call	<1ms
100 concurrent requests	0.03s (~3,333 RPS)

For comparison, running 100 concurrent requests on other backends:

Backend	100 concurrent	RPS
Hyperlight	0.03s	~3,333
Docker	8.4s	~12
Podman	18.2s	~5.5

Hyperlight is 280x faster than Docker and 600x faster than Podman at concurrent workloads. The trade-off: it runs Wasm modules only, not arbitrary shell commands, and requires Linux with KVM.

Apple Containers: VM isolation on macOS

Apple Containers (macOS 26+) give you Firecracker-like isolation on Apple Silicon without requiring Linux or KVM. Each container runs in its own VM with a separate kernel.

Metric	Docker (macOS)	Apple Containers
Isolation	Shared kernel	Separate VM
Boot time	~175ms	~860ms
Full lifecycle	~500ms	~940ms
Memory per instance	~50MB	~100MB+

Apple Containers are 2x slower than Docker on macOS, but they provide hardware-level isolation. If you're running untrusted code on macOS, that trade-off is worth it.

Docker and Podman: the container backends

Both Docker and Podman use an optimized run --rm path that combines creation, execution, and cleanup into a single operation. This is 35x faster than the naive start-exec-stop cycle.

macOS (M3 Pro)

Backend	Latency	Cold start
Docker	~220ms	~270ms
Podman	~300ms	~730ms

Docker is ~30% faster on macOS due to its daemon architecture.

Linux (AMD EPYC)

Backend	Latency	Cold start
Podman	~310ms	~350ms
Docker	~350ms	~550ms

On Linux, Podman is ~10-15% faster because it runs daemonless -- no Docker daemon overhead.

Daemon mode: 4x speedup for repeated commands

The daemon maintains a pool of 3-5 pre-booted Firecracker VMs. When you run a command, it grabs a warm VM from the pool, executes via vsock, and returns the VM for reuse.

Metric	Ephemeral	Daemon	Speedup
First command	800ms	195ms	4.1x
Subsequent	800ms	195ms	4.1x
10 sequential	8.0s	1.95s	4.1x
VM reuse rate	0%	~95%	--

The daemon starts in ~3 seconds (pre-warms 3 VMs) and then every command benefits from the warm pool.

Stress test results

Docker (macOS) -- 10 parallel sandboxes

Metric	Value
Total time	4.5s
Success rate	100%
Full lifecycle (avg)	446ms
Create (avg)	44ms
Start (avg)	174ms
Exec (avg)	83ms
Stop (avg)	109ms
Remove (avg)	41ms

Docker (macOS) -- 10 cycles, 5x2 iterations

Metric	Value
Throughput	1.8-2.0/sec
p50 latency	498ms
p95 latency	702ms
p99 latency	1028ms

Docker (Linux) -- 100 cycles, 10x10 iterations

Metric	Value
Total wall time	119.4s
Success rate	100%
Avg lifecycle	1,194ms
p50	1,178ms
p95	1,458ms
p99	1,705ms
Throughput	0.84/sec

Orchestration backends: Kubernetes and Nomad

The orchestration backends run sandboxes on remote clusters instead of the local machine. This adds network overhead but enables team-scale and multi-tenant deployments.

Numbers below are measured on both platforms: an AMD EPYC server (16 cores, 57 GB) with k3d single-node and Nomad dev agent for Linux, and an M3 Pro MacBook (12 cores, 36 GB) with k3d and Nomad for macOS.

Single sandbox lifecycle

Full create → start → exec → stop cycle, averaged over 5 iterations:

Linux (AMD EPYC)

Operation	Kubernetes	Nomad	Docker (baseline)
Create	92ms	39ms	47ms
Start	904ms	811ms	198ms
Exec	128ms	165ms	68ms
Stop	101ms	38ms	152ms
Total	1,225ms	1,053ms	465ms

macOS (M3 Pro)

Operation	Kubernetes	Nomad	Docker (baseline)
Create	100ms	166ms	104ms
Start	785ms	1,280ms	231ms
Exec	120ms	392ms	116ms
Stop	78ms	159ms	169ms
Total	1,083ms	1,997ms	620ms

Kubernetes has the fastest exec path on both platforms thanks to its WebSocket-based exec API. Docker is fastest for the full lifecycle due to simpler local container management.

One-shot `run` command

agentkernel run --backend <backend> -- echo hello (full lifecycle in one command):

Backend	Linux	macOS
Kubernetes	571ms	594ms
Nomad	569ms	580ms
Docker	580ms	577ms

All three backends converge to ~575ms for one-shot execution on both platforms.

Exec throughput

50 sequential exec calls on a single running sandbox:

Backend	Linux avg	Linux RPS	macOS avg	macOS RPS
Docker	67ms	14.8/sec	103ms	9.6/sec
Kubernetes	128ms	7.7/sec	99ms	10.0/sec
Nomad	163ms	6.1/sec	365ms	2.7/sec

Kubernetes and Docker trade the lead depending on platform. Nomad's alloc exec CLI path adds overhead per call.

Concurrent scale

How many sandboxes can run simultaneously on a single node:

Kubernetes (k3d single-node)

Count	Create	Start	Running	Parallel exec
5	116ms	1.4s	5/5	124ms
10	134ms	1.4s	10/10	163ms
20	241ms	120s	15/20	287ms

The k3d single-node cluster hits resource limits at ~15 pods. A production cluster with multiple nodes handles hundreds.

Nomad (local dev agent)

Count	Create	Start	Running	Parallel exec
5	46ms	814ms	5/5	171ms
10	52ms	834ms	10/10	197ms
20	107ms	1.4s	20/20	310ms

Nomad successfully ran all 20 concurrent sandboxes where k3d failed at 20. Nomad's scheduling is more resilient on a single node, though K8s is faster per-operation when resources are available.

Choosing a backend

Use case	Recommended	Why
Interactive / API server	Firecracker daemon	195ms latency, full VM isolation
High-throughput Wasm	Hyperlight pool	3,300 RPS, sub-microsecond acquire
macOS development (speed)	Docker	Fastest macOS backend at ~220ms
macOS development (security)	Apple Containers	VM isolation on macOS
Linux CI/CD (no KVM)	Docker	Works without KVM
Untrusted code (Linux)	Firecracker	Separate kernel per sandbox
Untrusted code (macOS)	Apple Containers	Separate VM per sandbox
Team / multi-tenant	Kubernetes	7.7 exec/sec, NetworkPolicy isolation
HashiCorp stack	Nomad	Integrates with Consul/Vault, scales to 50+ on single node

Running your own benchmarks

# Stress test (parallel sandbox creation)
cargo test --test stress_test -- --nocapture --ignored

# Benchmark test (repeated lifecycle with statistics)
cargo test --test benchmark_test -- --nocapture --ignored

# Shell script (per-operation latency)
./scripts/benchmark.sh

# Throughput test (100 commands, 10 concurrent)
./scripts/stress-test.sh 100 10

Configure with environment variables:

# Stress test
STRESS_VM_COUNT=1000 STRESS_MAX_CONCURRENT=100 cargo test --test stress_test -- --nocapture --ignored

# Benchmark test
BENCH_SANDBOXES=20 BENCH_ITERATIONS=5 cargo test --test benchmark_test -- --nocapture --ignored

Results are saved to benchmark-results/ as JSON for comparison across runs.

Test hardware

Platform	CPU	Use
Linux	AMD EPYC (16 cores, 57 GB)	Firecracker, Hyperlight, Docker, Podman, Kubernetes (k3d), Nomad
macOS	Apple M3 Pro (12 cores, 36 GB)	Docker, Podman, Apple Containers, Kubernetes (k3d), Nomad