AI Agent Sandboxes: The Infrastructure Layer Every Builder Needs to Understand

Sandboxes have become the single most critical infrastructure primitive for moving AI agents from demos to production. Here's everything developers and founders need to know.

Stan Sedberry
Stan Sedberry
18 min read13 views
AI Agent Sandboxes: The Infrastructure Layer Every Builder Needs to Understand

Sandboxes have become the single most critical infrastructure primitive for moving AI agents from demos to production. Here's everything developers and founders need to know — from isolation technologies and architecture patterns to the $260M+ market taking shape around them.

AI agents are writing code, browsing the web, calling APIs, operating GUIs, and training models. They're doing it autonomously, at machine speed, millions of times per day. And every single one of those actions is a potential security incident if the agent isn't running inside a sandbox.

This isn't a theoretical concern. In 2023, security researcher Johann Rehberger demonstrated that ChatGPT's Code Interpreter could be tricked via prompt injection into exfiltrating uploaded data to attacker-controlled servers. He later discovered that Code Interpreter sandboxes were shared between different GPTs for the same user — meaning a malicious GPT could steal files from your conversations with other GPTs. OpenAI took over 90 days to fix it.

That was 2023. In March 2026, BeyondTrust discovered that AWS Bedrock AgentCore's "Sandbox Mode" permits outbound DNS queries, enabling full command-and-control channels through DNS tunneling. AWS chose not to fix it — instead updating documentation to recommend "VPC mode." The Langflow platform had an unauthenticated remote code execution vulnerability that persisted for two full years, with no sandboxing in place. Threat actors deployed botnets through compromised instances. Lakera demonstrated a zero-click exploit chain against Cursor IDE via MCP where a malicious Google Doc could trigger credential harvesting without any user interaction.

The pattern is unmistakable: agents without sandboxes aren't a security risk — they're a security certainty.

And the market has responded. Over $260 million in venture funding has poured into purpose-built sandbox companies. Firecracker microVMs have become the gold standard isolation technology. Alibaba just open-sourced OpenSandbox to 9,000+ GitHub stars in under three weeks. And the UK AI Safety Institute just published the first reproducible benchmark for LLM sandbox escape — proving that frontier models can and do break out of improperly configured containers.

Why Agents Without Sandboxes Are Playing With Fire

The fundamental problem is deceptively simple: LLMs are non-deterministic systems that generate and execute code. An agent tasked with data analysis might write a script that deletes files, opens network connections, or exfiltrates credentials — not out of malice, but because the model's next-token prediction led there. As Bunnyshell puts it in their sandboxing guide: running such workloads without proper isolation is like giving an untrained intern root access to your production servers.

The Veracode 2025 GenAI Code Security Report found that 45% of AI-generated code introduces security vulnerabilities. OWASP ranks prompt injection as the number one AI threat, and Obsidian Security reports it present in over 73% of production deployments. These numbers frame the scale of what sandboxes must contain.

The Threat Surface Is Concrete and Well-Documented

The NVIDIA AI Red Team identified three mandatory controls every agent deployment needs. First, network egress controls — without which attackers establish reverse shells and exfiltrate API tokens. Second, filesystem write restrictions — since files like ~/.zshrc execute automatically and enable persistent backdoors. Third, configuration file protection — since .cursorrules and CLAUDE.md files give attackers durable control over agent behavior across sessions. Their core finding is stark: application-level controls are insufficient because once control passes to a subprocess, the application has no visibility into or control over what happens next.

The real-world incident list extends well beyond the examples above. LayerX found that Claude Desktop Extensions execute without sandboxing and with full host privileges, enabling zero-click remote code execution from a single calendar event — rated CVSS 10 out of 10. The Shai-Hulud supply chain attack of late 2025 compromised over 700 npm packages, created 25,000+ malicious GitHub repos, and exposed roughly 14,000 secrets across 487 organizations.

These incidents establish a clear principle: sandboxing is not defense-in-depth. It is the primary containment mechanism for autonomous AI systems.

How Sandbox Isolation Actually Works Under the Hood

The technical landscape for agent sandboxing has converged around five isolation approaches, each with distinct trade-offs between security, performance, and operational complexity.

Firecracker MicroVMs: The Gold Standard

Firecracker provides hardware-level isolation using Linux KVM. Each sandbox gets its own guest kernel — a fundamental security advantage over containers, which share the host kernel and therefore share its attack surface. Written in roughly 50,000 lines of Rust (compared to QEMU's 1.4 million lines of C), Firecracker exposes only five emulated devices, eliminating vast categories of potential exploits.

Cold start performance is 125 milliseconds or less to user-space code, with memory overhead under 5 MiB per VM. But the real performance breakthrough is snapshot/restore: Firecracker can serialize full VM state — CPU registers, memory pages, device state — and restore it in as little as 4 milliseconds. AWS Lambda's SnapStart uses this technique to eliminate cold starts entirely: boot once, snapshot, clone on demand. E2B, Fly.io Sprites, Cloudflare Sandbox, and Vercel all build on Firecracker.

The scale numbers are staggering. MiniMax and Tencent Cloud's Agent Runtime Sandbox (Forge) now run 100,000+ concurrent agent sandboxes with 600,000 sandboxes delivered per minute and 80ms median spin-up times — a 96% reduction in cold starts. Alibaba's OpenSandbox matches this scale at 15,000+ sandboxes per minute on Kubernetes.

gVisor: The Pragmatic Middle Ground

Google's gVisor is a user-space kernel written in memory-safe Go that intercepts all syscalls from sandboxed containers and handles them entirely in user space. Of Linux's roughly 350 syscalls, gVisor's Sentry component implements 237 but itself uses only 68 host syscalls — a massive attack surface reduction without the overhead of hardware virtualization. The trade-off is 10-30% I/O overhead on I/O-heavy workloads.

gVisor powers Google Cloud Run, Cloud Functions, and App Engine. It's the isolation technology behind Modal's sandboxes and Google's new Agent Sandbox for GKE, which shipped formal Kubernetes CRDs (Sandbox, SandboxTemplate, SandboxWarmPool) in March 2026.

V8 Isolates: The Fastest Option at the Narrowest Scope

Cloudflare Workers run thousands of lightweight execution contexts within a single OS process, with cold starts of roughly 1-5 milliseconds — about 100x faster than containers. Cloudflare layers five security boundaries: V8's internal sandbox, process-level seccomp filtering, trust-based cordoning, hardware memory protection keys, and the V8 sandbox itself.

In March 2026, Cloudflare launched Dynamic Workers in open beta, specifically designed for AI agent code execution with what they describe as 100x faster sandbox creation than traditional containers and no concurrency limits.

The most interesting innovation is Cap'n Web RPC bindings. Instead of proxying HTTP requests and hoping credentials stay out, developers define typed TypeScript interfaces that expose exactly the operations allowed. The agent sees only the interface definition. No raw API keys ever enter the sandbox.

WebAssembly: Formally Verifiable Memory Safety

WebAssembly provides the fastest cold starts of any isolation technology (sub-10ms) with formally verifiable memory safety. Each Wasm module operates in its own linear memory space with capability-based security — zero privileges by default. NVIDIA has demonstrated using Pyodide (CPython compiled to Wasm) for sandboxing LLM-generated Python in agentic workflows.

The limitation remains ecosystem maturity. Python scientific computing libraries and ML frameworks often lack WASI compatibility, constraining practical agent use cases.

Standard Docker Containers: The Weakest Link

Standard Docker containers remain the most commonly used and the least secure option for agent workloads. The shared kernel is the critical weakness — multiple container escape CVEs in 2024 and 2025 demonstrated practical breakout paths. The NVIDIA AI Red Team states plainly that standard containers are insufficient for untrusted code.

Docker has effectively conceded this point. Docker Sandboxes (Desktop 4.60+) now run each sandbox in a dedicated microVM with its own Docker daemon.

The Isolation Technology Comparison

The choice between these technologies isn't one-size-fits-all:

Firecracker microVMs offer the strongest security boundary (hardware KVM), cold starts of 80-125ms (4ms with snapshot restore), and support for any language. Best for untrusted agent code, RL training environments, and GUI agents.

gVisor provides strong isolation via its user-space kernel, millisecond-level cold starts with warm pools, and any OCI container. Best for Kubernetes-native deployments and stateful enterprise agents.

V8 isolates deliver the fastest cold starts (1-5ms) through language-runtime isolation, with unlimited concurrency. Best for edge deployments and agents using Code Mode with typed APIs. Limited to JavaScript, TypeScript, and WebAssembly.

WebAssembly achieves sub-10ms cold starts with formally verifiable memory safety. Best for stateless tool execution. Limited by ecosystem maturity for Python-heavy agent workloads.

Docker containers have the slowest cold starts (300ms-1s) and weakest isolation through a shared kernel. Should only be used for trusted internal code — never for untrusted agent output.

Beyond Compute: The Three Additional Isolation Layers

Production sandboxes require more than just process isolation. Three additional layers are essential.

Network sandboxing follows a default-deny model. E2B offers configurable firewall modes — allow-all, deny-all, and custom rules matching domains via SNI inspection. Vercel supports dynamic policies that can be tightened at runtime.

Filesystem isolation restricts what agents can read and write. Anthropic's Claude Code uses bubblewrap on Linux and sandbox-exec on macOS to scope file access to specific directories.

Credential management follows the proxy pattern — the most important architectural innovation in the sandbox space. A process outside the sandbox intercepts outbound requests and injects authentication headers at the network level. Credentials never enter the sandbox environment.

Who's Building the Sandbox Market: The $260M+ Landscape

The sandbox infrastructure market has crystallized rapidly, with a clear split between ephemeral (security-first) and persistent (productivity-first) approaches.

E2B: The Adoption Leader

E2B is the most widely deployed sandbox platform in the market. Founded in 2023, the company has raised approximately $43.8 million including a $21 million Series A led by Insight Partners in July 2025. E2B's growth trajectory tells the story of the entire category: from 40,000 sandboxes per month in March 2024 to 15 million per month by March 2025 — a 375x increase in twelve months.

The company reports that 88% of Fortune 100 companies now use its platform. Key customers include Perplexity, Manus, and Hugging Face. Built on Firecracker microVMs with roughly 80ms cold starts, E2B's open-source SDK has over 8,900 GitHub stars.

Daytona: The Fastest-Growing Challenger

Daytona represents the strongest persistent-state thesis in the market. After pivoting from human developer environments to AI agent infrastructure in early 2025, the company raised a $24 million Series A led by FirstMark Capital in February 2026.

The growth numbers are remarkable: Daytona reached $1 million ARR in under three months after relaunch and doubled to $2 million six weeks later. The platform offers sub-90ms sandbox creation (some configurations reach 27ms), fork/snapshot/resume capabilities, and support for Linux, Windows, and macOS desktop environments.

Modal: The Highest Valuation

Modal holds the highest valuation in the category at $1.1 billion following its October 2025 Series B, with reports of a subsequent raise led by General Catalyst at a $2.5 billion valuation in early 2026. With approximately $50 million in ARR and gVisor-based isolation, Modal's key differentiator is GPU access (A100/H100) and a code-first developer experience.

Browserbase: Owning Browser Infrastructure

Browserbase dominates the browser-as-infrastructure category with $67.5 million raised at a $300 million valuation in its June 2025 Series B. Over 1,000 companies and 20,000 developers use Browserbase for headless browser sessions optimized for AI agents.

The Hyperscaler Moves

Among cloud providers, Google has made the most aggressive move. Agent Sandbox on GKE launched as an open-source Kubernetes project at KubeCon NA 2025, introducing formal CRDs with gVisor and Kata Containers support.

NVIDIA dropped a full agent security platform at GTC in March 2026. OpenShell provides hardware-enforced sandbox runtime with strict boundaries enforced at the BlueField DPU layer — below the hypervisor.

Get insights like this in your inbox

Join our newsletter for deep dives on AI, technology, and building the future. No spam, unsubscribe anytime.

Architecture Patterns That Win in Production

Two fundamental patterns have emerged for integrating sandboxes into agent systems. The industry is converging on a clear winner.

Pattern 1: Agent IN Sandbox

This approach co-locates the agent runtime and execution environment inside the sandbox. Users communicate with the agent over HTTP or WebSocket across the sandbox boundary.

The critical downside: API keys must live inside the sandbox, creating credential exposure risk. If the sandbox has credentials, a prompt injection attack has credentials too.

Pattern 2: Sandbox as Tool (The Winner)

This approach runs the agent on your server while calling the sandbox remotely via API only when code execution is needed. The agent harness retains state and credentials; the sandbox is stateless and disposable.

Pattern 2 is winning for most production workflows because it cleanly separates security contexts. API keys never enter the sandbox. Iteration is fast — no image rebuilds needed. Multiple sandboxes can execute in parallel.

The Proxy Pattern: How Credentials Stay Safe

The proxy pattern for credential management has emerged as the essential complement to Pattern 2. The sandbox container runs with --network none — zero direct network access. A proxy mounted via Unix socket handles domain allowlisting, credential injection, and request auditing.

The result: the agent never sees real credentials. Even if a prompt injection attack compromises the agent's reasoning, it cannot exfiltrate data to unauthorized servers because network access is mediated entirely by the proxy.

The Escape Benchmark That Changed the Conversation

In March 2026, the UK AI Safety Institute and academic collaborators published SANDBOXESCAPEBENCH — the first open, reproducible benchmark for evaluating whether frontier LLMs can escape sandbox environments.

The key findings: frontier LLMs identify and exploit flaws when they exist. A 70B-parameter model with chain-of-thought reasoning and tool access escaped gVisor in 18% of runs. Firecracker held in every tested case — but only when snapshot/restore was correctly configured. Docker and OCI containers remained the weakest link throughout testing.

The benchmark validates the technical hierarchy while adding a crucial caveat: even the strongest isolation technology fails when misconfigured at runtime.

The Practical Playbook for Builders

If you're building an agent product today, here's the decision framework:

For code execution sandboxes, adopt Pattern 2 (Sandbox as Tool) with Firecracker-based isolation. E2B is the safest default for ephemeral workloads; Daytona for persistent state.

For browser automation, Browserbase is the clear market leader.

For credential management, implement the proxy pattern from day one. Run sandboxes with --network none and mediate all outbound traffic through a proxy that injects credentials and enforces domain allowlists.

For network controls, default to deny-all and explicitly allowlist the domains your agent needs.

For the open-source path, Alibaba's OpenSandbox and NVIDIA's NemoClaw are the two most significant recent releases.

The Bottom Line

Two years ago, sandboxing for AI agents was a niche concern discussed in security circles. Today it's a $260M+ funded category with hyperscaler backing, open benchmarks proving escape risks are real, and production deployments running at 600,000 sandboxes per minute.

The narrative has shifted decisively from "should we sandbox?" to "which sandbox survives 100,000 concurrent workloads and passes SANDBOXESCAPEBENCH?" The winners in 2026 are the teams that treat isolation as a first-class, measurable, and observable production primitive — not an afterthought bolted on after the first incident.

Sandboxes have become the kernel of the agent economy. If you're building agents, this is your infrastructure layer. Choose it deliberately.

Get insights like this in your inbox

Join our newsletter for deep dives on AI, technology, and building the future. No spam, unsubscribe anytime.