Benchmarks
agentkernel runs on five different backends across Linux and macOS. We benchmark all of them so you know exactly what to expect.
All numbers below are measured on real hardware -- an AMD EPYC server for Linux backends and an M3 Pro MacBook for macOS backends. No synthetic microbenchmarks. Every number represents the full end-to-end latency of agentkernel run -- echo hello, from command invocation to output.
The headline numbers
| Backend | Latency | Throughput | Isolation |
|---|---|---|---|
| Hyperlight pool (Linux) | <1µs | ~3,300 RPS | Hypervisor + Wasm |
| Firecracker daemon (Linux) | 195ms | ~5.1/sec | Full VM (separate kernel) |
| Docker (macOS) | ~220ms | ~4.5/sec | Container (shared kernel) |
| Docker pool (Linux) | ~250ms | ~4.0/sec | Container (shared kernel) |
| Podman (macOS) | ~300ms | ~3.3/sec | Container (rootless) |
| Podman (Linux) | ~310ms | ~3.2/sec | Container (rootless) |
| Kubernetes (remote) | ~570ms | ~7.7/sec | Pod (NetworkPolicy) |
| Nomad (remote) | ~570ms | ~6.1/sec | Job allocation |
| Firecracker cold (Linux) | 800ms | ~1.3/sec | Full VM (separate kernel) |
| Apple Containers (macOS 26+) | ~940ms | ~1.1/sec | Full VM (separate kernel) |
Pre-warmed pools make the fastest backends feel instant. Cold starts are still faster than most container runtimes.
Where the time goes
Every sandbox execution has phases: boot the isolation boundary, wait for the environment to be ready, execute the command, then tear down. Here's how each backend breaks down:
| Backend | Boot | Ready | Exec | Shutdown |
|---|---|---|---|---|
| Hyperlight pool | 0ms | <1µs | <1ms | N/A |
| Firecracker daemon | 0ms | 0ms | 19ms | 0ms |
| Firecracker cold | 78ms | 110ms | 19ms | 20ms |
| Apple Containers | 860ms | 860ms | 95ms | 37ms |
Docker and Podman use a single run --rm operation internally, so their breakdown is a single combined step rather than separate phases.
The daemon and pool backends eliminate boot and shutdown by reusing pre-warmed instances. You pay the startup cost once, then every subsequent execution skips straight to the fast part.
Firecracker vs Docker
The comparison that matters most -- VM isolation vs container isolation on the same Linux hardware:
| Metric | Docker | Firecracker | Winner |
|---|---|---|---|
| Process start | 40ms | 46ms | Tie |
| Instance ready | 155ms | 110ms | Firecracker |
| Command execution | 53ms | 19ms | Firecracker (vsock) |
| Shutdown | 130ms | 20ms | Firecracker (6.5x) |
| Memory per instance | ~50-100MB | <10MB | Firecracker (5-10x) |
| Isolation | Shared kernel | Separate kernel | Firecracker |
Firecracker uses vsock for command execution -- a direct host-to-VM communication channel that's 3x faster than Docker's exec path. Shutdown is 6.5x faster because there's no container runtime overhead.
And Firecracker's boot time was optimized from 961ms down to 110ms -- an 89% reduction -- by disabling unnecessary kernel drivers:
| Optimization | Time saved |
|---|---|
Disable PS/2 keyboard driver (i8042.nokbd) |
~500ms |
Skip PS/2 aux port probe (i8042.noaux) |
~260ms |
Quiet boot (quiet loglevel=4) |
~90ms |
Hyperlight: sub-microsecond execution
Hyperlight is the experimental backend that pushes the boundaries of what's possible. It uses Microsoft's hypervisor-isolated micro VMs to run WebAssembly modules with dual-layer security: a Wasm sandbox inside a hypervisor boundary.
The key number: warm pool acquire takes 0.2µs. That's 50,000x faster than a cold Hyperlight startup (68ms) and over 1,000,000x faster than a Firecracker cold boot (800ms).
| Metric | Value |
|---|---|
| Cold startup | 68ms (avg), 67ms (p50) |
| Warm acquire | 0.2µs (avg), <1µs (p50) |
| Function call | <1ms |
| 100 concurrent requests | 0.03s (~3,333 RPS) |
For comparison, running 100 concurrent requests on other backends:
| Backend | 100 concurrent | RPS |
|---|---|---|
| Hyperlight | 0.03s | ~3,333 |
| Docker | 8.4s | ~12 |
| Podman | 18.2s | ~5.5 |
Hyperlight is 280x faster than Docker and 600x faster than Podman at concurrent workloads. The trade-off: it runs Wasm modules only, not arbitrary shell commands, and requires Linux with KVM.
Apple Containers: VM isolation on macOS
Apple Containers (macOS 26+) give you Firecracker-like isolation on Apple Silicon without requiring Linux or KVM. Each container runs in its own VM with a separate kernel.
| Metric | Docker (macOS) | Apple Containers |
|---|---|---|
| Isolation | Shared kernel | Separate VM |
| Boot time | ~175ms | ~860ms |
| Full lifecycle | ~500ms | ~940ms |
| Memory per instance | ~50MB | ~100MB+ |
Apple Containers are 2x slower than Docker on macOS, but they provide hardware-level isolation. If you're running untrusted code on macOS, that trade-off is worth it.
Docker and Podman: the container backends
Both Docker and Podman use an optimized run --rm path that combines creation, execution, and cleanup into a single operation. This is 35x faster than the naive start-exec-stop cycle.
macOS (M3 Pro)
| Backend | Latency | Cold start |
|---|---|---|
| Docker | ~220ms | ~270ms |
| Podman | ~300ms | ~730ms |
Docker is ~30% faster on macOS due to its daemon architecture.
Linux (AMD EPYC)
| Backend | Latency | Cold start |
|---|---|---|
| Podman | ~310ms | ~350ms |
| Docker | ~350ms | ~550ms |
On Linux, Podman is ~10-15% faster because it runs daemonless -- no Docker daemon overhead.
Daemon mode: 4x speedup for repeated commands
The daemon maintains a pool of 3-5 pre-booted Firecracker VMs. When you run a command, it grabs a warm VM from the pool, executes via vsock, and returns the VM for reuse.
| Metric | Ephemeral | Daemon | Speedup |
|---|---|---|---|
| First command | 800ms | 195ms | 4.1x |
| Subsequent | 800ms | 195ms | 4.1x |
| 10 sequential | 8.0s | 1.95s | 4.1x |
| VM reuse rate | 0% | ~95% | -- |
The daemon starts in ~3 seconds (pre-warms 3 VMs) and then every command benefits from the warm pool.
Stress test results
Docker (macOS) -- 10 parallel sandboxes
| Metric | Value |
|---|---|
| Total time | 4.5s |
| Success rate | 100% |
| Full lifecycle (avg) | 446ms |
| Create (avg) | 44ms |
| Start (avg) | 174ms |
| Exec (avg) | 83ms |
| Stop (avg) | 109ms |
| Remove (avg) | 41ms |
Docker (macOS) -- 10 cycles, 5x2 iterations
| Metric | Value |
|---|---|
| Throughput | 1.8-2.0/sec |
| p50 latency | 498ms |
| p95 latency | 702ms |
| p99 latency | 1028ms |
Docker (Linux) -- 100 cycles, 10x10 iterations
| Metric | Value |
|---|---|
| Total wall time | 119.4s |
| Success rate | 100% |
| Avg lifecycle | 1,194ms |
| p50 | 1,178ms |
| p95 | 1,458ms |
| p99 | 1,705ms |
| Throughput | 0.84/sec |
Orchestration backends: Kubernetes and Nomad
The orchestration backends run sandboxes on remote clusters instead of the local machine. This adds network overhead but enables team-scale and multi-tenant deployments.
Numbers below are measured on both platforms: an AMD EPYC server (16 cores, 57 GB) with k3d single-node and Nomad dev agent for Linux, and an M3 Pro MacBook (12 cores, 36 GB) with k3d and Nomad for macOS.
Single sandbox lifecycle
Full create → start → exec → stop cycle, averaged over 5 iterations:
Linux (AMD EPYC)
| Operation | Kubernetes | Nomad | Docker (baseline) |
|---|---|---|---|
| Create | 92ms | 39ms | 47ms |
| Start | 904ms | 811ms | 198ms |
| Exec | 128ms | 165ms | 68ms |
| Stop | 101ms | 38ms | 152ms |
| Total | 1,225ms | 1,053ms | 465ms |
macOS (M3 Pro)
| Operation | Kubernetes | Nomad | Docker (baseline) |
|---|---|---|---|
| Create | 100ms | 166ms | 104ms |
| Start | 785ms | 1,280ms | 231ms |
| Exec | 120ms | 392ms | 116ms |
| Stop | 78ms | 159ms | 169ms |
| Total | 1,083ms | 1,997ms | 620ms |
Kubernetes has the fastest exec path on both platforms thanks to its WebSocket-based exec API. Docker is fastest for the full lifecycle due to simpler local container management.
One-shot run command
agentkernel run --backend <backend> -- echo hello (full lifecycle in one command):
| Backend | Linux | macOS |
|---|---|---|
| Kubernetes | 571ms | 594ms |
| Nomad | 569ms | 580ms |
| Docker | 580ms | 577ms |
All three backends converge to ~575ms for one-shot execution on both platforms.
Exec throughput
50 sequential exec calls on a single running sandbox:
| Backend | Linux avg | Linux RPS | macOS avg | macOS RPS |
|---|---|---|---|---|
| Docker | 67ms | 14.8/sec | 103ms | 9.6/sec |
| Kubernetes | 128ms | 7.7/sec | 99ms | 10.0/sec |
| Nomad | 163ms | 6.1/sec | 365ms | 2.7/sec |
Kubernetes and Docker trade the lead depending on platform. Nomad's alloc exec CLI path adds overhead per call.
Concurrent scale
How many sandboxes can run simultaneously on a single node:
Kubernetes (k3d single-node)
| Count | Create | Start | Running | Parallel exec |
|---|---|---|---|---|
| 5 | 116ms | 1.4s | 5/5 | 124ms |
| 10 | 134ms | 1.4s | 10/10 | 163ms |
| 20 | 241ms | 120s | 15/20 | 287ms |
The k3d single-node cluster hits resource limits at ~15 pods. A production cluster with multiple nodes handles hundreds.
Nomad (local dev agent)
| Count | Create | Start | Running | Parallel exec |
|---|---|---|---|---|
| 5 | 46ms | 814ms | 5/5 | 171ms |
| 10 | 52ms | 834ms | 10/10 | 197ms |
| 20 | 107ms | 1.4s | 20/20 | 310ms |
Nomad successfully ran all 20 concurrent sandboxes where k3d failed at 20. Nomad's scheduling is more resilient on a single node, though K8s is faster per-operation when resources are available.
Choosing a backend
| Use case | Recommended | Why |
|---|---|---|
| Interactive / API server | Firecracker daemon | 195ms latency, full VM isolation |
| High-throughput Wasm | Hyperlight pool | 3,300 RPS, sub-microsecond acquire |
| macOS development (speed) | Docker | Fastest macOS backend at ~220ms |
| macOS development (security) | Apple Containers | VM isolation on macOS |
| Linux CI/CD (no KVM) | Docker | Works without KVM |
| Untrusted code (Linux) | Firecracker | Separate kernel per sandbox |
| Untrusted code (macOS) | Apple Containers | Separate VM per sandbox |
| Team / multi-tenant | Kubernetes | 7.7 exec/sec, NetworkPolicy isolation |
| HashiCorp stack | Nomad | Integrates with Consul/Vault, scales to 50+ on single node |
Running your own benchmarks
# Stress test (parallel sandbox creation)
cargo test --test stress_test -- --nocapture --ignored
# Benchmark test (repeated lifecycle with statistics)
cargo test --test benchmark_test -- --nocapture --ignored
# Shell script (per-operation latency)
./scripts/benchmark.sh
# Throughput test (100 commands, 10 concurrent)
./scripts/stress-test.sh 100 10
Configure with environment variables:
# Stress test
STRESS_VM_COUNT=1000 STRESS_MAX_CONCURRENT=100 cargo test --test stress_test -- --nocapture --ignored
# Benchmark test
BENCH_SANDBOXES=20 BENCH_ITERATIONS=5 cargo test --test benchmark_test -- --nocapture --ignored
Results are saved to benchmark-results/ as JSON for comparison across runs.
Test hardware
| Platform | CPU | Use |
|---|---|---|
| Linux | AMD EPYC (16 cores, 57 GB) | Firecracker, Hyperlight, Docker, Podman, Kubernetes (k3d), Nomad |
| macOS | Apple M3 Pro (12 cores, 36 GB) | Docker, Podman, Apple Containers, Kubernetes (k3d), Nomad |