Prompt Injection to Sandbox Escape: How Google's AI Filesystem Agent Became an RCE Vector

A sanitization flaw in Google's AI-based Antigravity tool allowed attackers to inject prompts, escape the sandbox, and execute arbitrary code — exposing the fragile security model of agentic AI.

2026-04-21 · Source: Dark Reading

🔬

RESEARCH ANALYSIS

This analysis is based on research published by Dark Reading. CypherByte adds analysis, context, and security team recommendations.

Research sourced and credited to original reporting by Dark Reading. CypherByte analysis and technical commentary is original work by our research team.

Executive Summary

Google has patched a critical remote code execution vulnerability in its AI-based Antigravity tool, an agentic AI product designed to perform autonomous filesystem operations. The flaw stemmed from an inadequate input sanitization mechanism that failed to neutralize maliciously crafted prompt injections — allowing adversaries to break containment boundaries and achieve full arbitrary code execution outside the intended sandbox environment. This is not a marginal edge case or theoretical attack chain. It is a direct, exploitable path from untrusted input to system-level compromise, and it arrived packaged inside a product category that enterprises are actively deploying at scale.

Security engineers, AI/ML platform teams, DevSecOps practitioners, and any organization evaluating or operating agentic AI tooling should treat this finding as a forcing function for immediate architectural review. The vulnerability class — prompt injection leading to sandbox escape — is not unique to Google's implementation. It reflects a structural tension at the heart of how large language model (LLM)-based agents are currently built: systems designed to interpret and act on natural language instructions are, by design, susceptible to having those instructions subverted. When that subversion reaches a filesystem agent with elevated permissions, the consequences escalate rapidly from data exposure to full host compromise.

Technical Analysis

The Antigravity tool operates as an agentic AI layer, meaning it uses an LLM reasoning engine to interpret user intent and autonomously execute actions — in this case, filesystem operations such as reading, writing, moving, and potentially executing files. Agentic architectures of this type typically funnel external input (user queries, file contents, API responses) directly into the model's context window as part of the instruction chain. This is precisely where the vulnerability surface opens.

Key Finding: The vulnerability was a failure to sanitize content entering the LLM's instruction context from filesystem-sourced inputs. A malicious file or crafted input could embed instruction sequences that the model interpreted as legitimate operator commands, overriding intended behavioral guardrails.

The attack mechanism follows a now-familiar but still devastatingly effective pattern. An adversary plants a file — or surfaces content through any vector the agent ingests — containing an embedded prompt injection payload. This payload is crafted to override the system prompt's safety framing or redirect the agent's action chain. Because the tool lacked robust sanitization at the boundary between environmental content and the model's instruction context, the injected directives were processed as authoritative commands rather than treated as untrusted data.

From there, the exploit progresses to sandbox escape. The Antigravity tool operated within a sandboxed execution environment intended to limit the blast radius of any single operation. However, the prompt injection gave the attacker the ability to instruct the agent to execute operations that the sandbox's permission model did not anticipate — specifically, sequences of seemingly innocuous filesystem calls that, chained together, allowed execution of arbitrary code outside the sandbox boundary. This is consistent with a class of attacks sometimes called tool-use exploitation, where the agent's own legitimate capabilities (in this case, filesystem read/write/execute hooks) become the mechanism of exploitation when redirected by injected instructions.

The result is arbitrary code execution (ACE): an attacker-controlled instruction set running with the privileges of the host process, outside the containment model the sandbox was designed to enforce. No separate exploit payload or binary shellcode was required. The LLM's own reasoning and action-execution pipeline served as the execution engine.

Impact Assessment

The immediate blast radius of this specific vulnerability is scoped to users and organizations running Google's Antigravity tool with filesystem operation capabilities enabled. However, the impact framework extends considerably further. Any system where the Antigravity agent operated with write or execute permissions on host filesystems — particularly in CI/CD pipelines, developer workstations, cloud-hosted environments, or automated document processing workflows — should be considered potentially compromised if exposed to untrusted content prior to the patch.

Affected Surface: Agentic AI deployments with filesystem access, particularly those processing external or user-supplied content without a hardened input boundary between environmental data and the model's instruction context.

Real-world consequences of successful exploitation include: exfiltration of sensitive files accessible to the agent process; deployment of persistent malware or backdoors via the agent's write capabilities; lateral movement within networked environments if the agent has cross-system access; and integrity compromise of codebases or configuration files in developer-facing deployments. In cloud-native environments, sandbox escape can trigger privilege escalation pathways that extend compromise to container orchestration layers or underlying host infrastructure.

Critically, this attack requires no authentication from the adversary's perspective once a malicious file or content artifact enters the agent's processing scope. The injection can be staged passively — a poisoned document in a shared drive, a malicious README in an open-source repository, or a crafted API response — making it suitable for both targeted intrusions and opportunistic, supply-chain-style attacks.

CypherByte's Perspective

This vulnerability is a landmark data point in a security narrative that our research team has been tracking closely: the attack surface of agentic AI is fundamentally different from traditional software, and the industry's defensive frameworks have not kept pace. Classic application security assumes a relatively static, developer-defined instruction set. Agentic AI inverts that model — the instruction set is dynamic, constructed at runtime from a blend of system prompts, user inputs, and environmental data. Defending that surface requires a different threat model entirely.

The prompt injection vulnerability class should now be treated with the same severity taxonomy as SQL injection or command injection. Like those predecessors, it exploits the failure to distinguish between data and instructions at a critical processing boundary. And like SQL injection in the 1990s, the industry is currently in a phase where the vulnerability is well-understood in research contexts but systematically underestimated in production deployments. Google's patch is a corrective measure — but the broader ecosystem of LLM-based agents, many built by smaller teams with fewer security resources than Google, almost certainly contains equivalent or worse implementations of the same flawed pattern.

For security teams advising on AI adoption strategy, this finding reinforces the principle of minimal capability scoping for agentic systems. An AI agent that can read, write, and execute on a filesystem is a high-value target and a high-risk deployment. That capability envelope should be treated as critically as any privileged service account or production database credential.

Indicators and Detection

Because prompt injection attacks leverage the model's own reasoning pipeline, they do not produce traditional malware signatures or anomalous binary behavior at the point of injection. Detection strategy must therefore focus on behavioral telemetry at the agent action layer rather than at the input layer. Defenders should monitor for the following indicators:

Process and Filesystem Telemetry: Unexpected process spawning from the agent host process, particularly shell invocations (sh, bash, cmd.exe, powershell) or interpreter calls (python, node) that fall outside the agent's defined operational scope. File writes to sensitive paths — /etc/, ~/.ssh/, startup directories, or CI/CD configuration files — initiated by the agent process should trigger immediate investigation.

Network Telemetry: Outbound connections from the agent process to destinations not consistent with its configured API endpoints. Data exfiltration via prompt injection often uses the agent's own network capabilities — watch for large outbound transfers or connections to newly registered domains.

LLM Audit Logging: Where available, enable full logging of model input context and tool-call sequences. Anomalous instruction patterns — particularly sequences that reference overriding system instructions, claiming elevated permissions, or invoking capabilities outside normal operational parameters — are strong indicators of active prompt injection attempts.

Detection Priority: Instrument agent tool-call sequences, not just input/output logs. The attack manifests in what the agent does, not just what it receives.

Recommendations

1. Patch Immediately. Apply Google's patch for the Antigravity tool without delay. Treat any environment where the unpatched tool processed external or user-supplied content as potentially compromised and conduct forensic review of agent action logs.

2. Audit All Agentic AI Deployments. Inventory every LLM-based agent in your environment that has access to filesystem operations, code execution capabilities, or external network calls. For each, assess whether environmental content (file contents, API responses, user inputs) enters the model's instruction context without sanitization.

3. Implement Input Boundary Hardening. Enforce strict separation between operator-defined system instructions and environmental data in all agent architectures. Environmental content should be explicitly framed as untrusted data in the model's context, not interleaved with instruction-level prompts. Evaluate dedicated prompt injection detection layers or adversarial input filters at ingestion boundaries.

4. Apply Least-Privilege Scoping to Agent Capabilities. Revoke any filesystem permissions not strictly required for the agent's defined function. Agents that only need to read files should never hold write or execute permissions. Treat agent process privileges with the same rigor as service account provisioning.

5. Enforce Sandbox Integrity Monitoring. Implement runtime integrity checks that alert on sandbox escape attempts — specifically, any agent process behavior that attempts to invoke capabilities or access paths outside its defined operational profile. Behavioral EDR rules targeting agent host processes are a practical first implementation.

6. Establish an AI Security Review Gate. Any agentic AI product — internally developed or third-party — that operates with elevated system permissions should pass a formal security review that includes prompt injection testing, sandbox escape assessment, and least-privilege capability analysis before production deployment.

Original vulnerability reporting by Dark Reading. CypherByte technical analysis and commentary is independent research. For questions about this analysis, contact the CypherByte Research Team.

// TOPICS

#research#analysis

// WANT MORE LIKE THIS?

Get full access to all research analyses, deep-dive writeups, and premium threat intelligence.

Join Premium Waitlist → Free weekly digest →

Share on X →