Durable Objects
Stateful sandbox actors with hibernation and alarms. Tracks agentkernel-2sn.
What It Does
A Durable Object is a sandbox that maintains persistent state across calls.
Each object has a unique identity (class + id), an in-sandbox HTTP
server for method dispatch, and key-value storage that survives hibernation
and restarts.
Use case: A per-user session cache, a rate limiter, a coordinator that tracks agent progress across multiple sandboxes, or a build cache that persists between CI runs.
Architecture
SDK Server Sandbox (port 9333)
| | |
|-- call("counter","a",inc)-> | |
| |-- lookup (class=counter,id=a)|
| | status = Hibernating |
| |-- start sandbox ------------>|
| |-- restore storage to sandbox |
| |-- POST :9333/increment ----->|
| | |-- update state
| | |-- return result
| |<---- { "value": 42 } --------|
|<-- { "result": {"value":42}}| |
| | |
| (idle timeout elapses) | |
| |-- persist storage |
| |-- stop sandbox (hibernate)-->|
| | status = Hibernating |
Server-owned lifecycle: The server decides when to start, hibernate, and wake objects. The SDK is a thin HTTP client. The in-sandbox HTTP server handles method dispatch — the server forwards calls to it.
Object Lifecycle
┌──────────────┐
call() │ │ idle timeout
┌────────────────────> Active ─────────────────────────┐
│ │ │ │
│ └──────┬───────┘ v
│ │ ┌─────────────────┐
│ call() │ │
│ │ │ Hibernating │
│ v │ │
│ ┌──────────────┐ └────────┬────────┘
│ │ │ │
│ │ Active │<──────────────┘
│ │ │ call() (auto-wake)
│ └──────────────┘
│
│ delete() ┌──────────────┐
└──────────────────────>│ Deleted │
└──────────────┘
States
| State | Sandbox | Storage | Description |
|---|---|---|---|
| Active | Running | In-memory + persisted | Object is handling calls |
| Hibernating | Stopped | Persisted in SQLite | Object is idle, no sandbox running |
| Deleted | Stopped | Purged | Object and all storage removed |
Hibernation
When an object has been idle for idle_timeout (default: 5 minutes), the
server:
- Reads the object's storage from the in-sandbox HTTP server (
GET :9333/__storage). - Persists the storage to SQLite (
object_storagetable). - Stops the sandbox.
- Sets status to
Hibernating.
On the next call(), the server:
- Creates and starts a new sandbox.
- Injects the stored key-value pairs via
POST :9333/__storage. - Forwards the method call.
- Sets status to
Active.
Consistency guarantee: Storage is persisted atomically within a single
SQLite transaction. A crash during hibernation leaves the object in Active
state; the server re-reads storage on next hibernation attempt.
In-Sandbox HTTP Server
Each Durable Object sandbox runs an HTTP server on port 9333. The server handles method dispatch and storage management.
Protocol
| Endpoint | Method | Purpose |
|---|---|---|
POST /:method |
POST | Call a method on the object |
GET /__storage |
GET | Dump all key-value pairs (for hibernation) |
POST /__storage |
POST | Restore key-value pairs (on wake) |
GET /__health |
GET | Health check |
Method Dispatch
Response:
The in-sandbox server is user-defined code. The server provides a template or SDK helper for each language to handle the boilerplate.
Storage Protocol
Dump (GET /__storage):
Restore (POST /__storage):
Storage values are JSON. Max size per key: 1 MB. Max keys per object: 10,000.
SDK API
Python
from agentkernel import AgentKernel
client = AgentKernel()
# Call a method (auto-creates + auto-wakes)
result = client.objects.call("counter", "user-123", method="increment", args={"amount": 5})
print(result) # {"value": 42}
# Get object status
info = client.objects.get("counter", "user-123")
print(info.status) # "Active" | "Hibernating"
print(info.storage) # {"count": 42, ...}
# List objects
items = client.objects.list(class_name="counter", status="Active")
# Delete object + storage
client.objects.delete("counter", "user-123")
# Set an alarm
client.objects.set_alarm(
"counter", "user-123",
method="reset",
fire_at="2026-02-16T00:00:00Z",
)
Node.js / TypeScript
import { AgentKernel } from "@anthropic/agentkernel";
const client = new AgentKernel();
const result = await client.objects.call("counter", "user-123", {
method: "increment",
args: { amount: 5 },
});
const info = await client.objects.get("counter", "user-123");
await client.objects.delete("counter", "user-123");
await client.objects.setAlarm("counter", "user-123", {
method: "reset",
fireAt: "2026-02-16T00:00:00Z",
});
Go
client := agentkernel.New()
result, _ := client.Objects.Call(ctx, "counter", "user-123", agentkernel.ObjectCall{
Method: "increment",
Args: map[string]any{"amount": 5},
})
info, _ := client.Objects.Get(ctx, "counter", "user-123")
Rust
let client = AgentKernel::new();
let result = client.objects().call("counter", "user-123", "increment", &args).await?;
let info = client.objects().get("counter", "user-123").await?;
Swift
let client = AgentKernel()
let result = try await client.objects.call(
class: "counter", id: "user-123",
method: "increment", args: ["amount": 5]
)
let info = try await client.objects.get(class: "counter", id: "user-123")
Alarms
Alarms schedule a method call on an object at a future time. The daemon's
cron scheduler fires alarms by calling the object's method via the same
call() path.
Setting Alarms
POST /objects/counter/user-123/alarms
{
"method": "reset",
"args": {},
"fire_at": "2026-02-16T00:00:00Z"
}
Alarm Guarantees
- At-least-once delivery: If the server crashes after an alarm fires
but before marking it
fired = 1, it will fire again on restart. - No exact-time guarantee: Alarms fire within 1 minute of
fire_at. The daemon polls for pending alarms every 30 seconds. - Alarm deduplication: Setting a new alarm for the same
(class, id, method)replaces the previous pending alarm.
Alarm Retry
If the method call fails, the alarm is retried with exponential backoff
(1s, 2s, 4s) up to 3 attempts. After 3 failures, the alarm is marked
fired = 1 and an AlarmFailed audit event is logged.
Object Registration
Objects must declare their class and supported methods before use.
Via agentkernel.toml
[[objects]]
class = "counter"
image = "node:22-alpine"
idle_timeout_seconds = 300 # 5 minutes
init_command = ["node", "/app/counter-server.js"]
[[objects]]
class = "session-cache"
image = "python:3.12-alpine"
idle_timeout_seconds = 600 # 10 minutes
init_command = ["python", "/app/cache_server.py"]
Via API
POST /objects/definitions
{
"class": "counter",
"image": "node:22-alpine",
"idle_timeout_seconds": 300,
"init_command": ["node", "/app/counter-server.js"]
}
Naming and Addressing
Objects are addressed by (class, id). The server maps this to a sandbox:
Examples:
- ("counter", "user-123") → sandbox do-counter-user-123
- ("session", "abc") → sandbox do-session-abc
The sandbox name must be unique. If a sandbox with the same name already exists (from a previous non-hibernated run), the server reuses it.
Retry and Failure
Method Call Failures
If a method call to the in-sandbox server fails:
- Sandbox not running → auto-wake (start + restore storage + retry).
- HTTP error from in-sandbox server → return error to caller (no retry).
- Sandbox start failure → return 503 to caller with
sandbox_unavailable. - Timeout (method takes >30s) → return 504 to caller.
Method calls are not retried by the server. The caller decides whether to retry. This differs from Durable Functions activities, which have server-side retry policies. Rationale: object methods may have non-idempotent side effects (e.g., incrementing a counter), so blind retry would cause double-counting.
Storage Consistency
- Storage is eventually consistent with the in-sandbox state. The server reads storage from the sandbox on hibernation, not on every call.
- If the sandbox crashes without hibernating, the last persisted storage is used. Any in-memory-only changes since the last hibernation are lost.
- To force a storage checkpoint without hibernating, call
POST /__storage/checkpointon the in-sandbox server (the server will read and persist).
Observability
GET /objects/:class/:idincludes storage dump and status.- Prometheus metrics:
agentkernel_objects_active{class},agentkernel_object_calls_total{class, method},agentkernel_object_hibernations_total{class},agentkernel_object_wake_duration_seconds{class}. - Audit log:
ObjectCreated,ObjectHibernated,ObjectWoken,ObjectDeleted.
Limits
| Limit | Value | Rationale |
|---|---|---|
| Max storage keys per object | 10,000 | SQLite row count |
| Max storage value size | 1 MB | SQLite BLOB |
| Max total storage per object | 50 MB | Disk budget |
| Max concurrent active objects | 200 | Sandbox resource constraints |
| Max alarms per object | 100 | Prevent alarm spam |
| Method call timeout | 30s | Prevent hung objects |
| Idle timeout | 5 min (default) | Configurable per class |
| Hibernation storage dump timeout | 10s | Prevent hung dumps |