Skip to main content
AI risk

Risk type: Prompt Injection

Prompt injection

Prompt injection is when hidden or malicious instructions cause an AI system to ignore its intended rules and do something unsafe or unintended.

Prompt injection matters because it turns ordinary web content into adversarial input for AI systems. The browser is where that untrusted content is found, copied, retrieved, previewed, and handed to copilots or agents that may have permission to act on what they read.

Reviewed byAakash HarishSecurity Research Contributor, LegbaReviewed 2026-04-09 · Updated 2026-04-09

Quick answer

The fastest way to reduce AI risk is to control what can be typed, pasted, and uploaded in the browser. Combine governance (approved tools and data boundaries) with browser-layer enforcement. When users browse unknown destinations as part of AI workflows, isolation reduces endpoint exposure by running web content in an isolated container and streaming only rendered output; sessions are deleted after use.

Last updated

2026-04-09

Affected tools

  • AI chat tools
  • AI copilots
  • Browser-based AI agents
  • RAG-enabled assistants

How it usually happens in the browser

  • A user pastes untrusted content (web pages, emails, tickets) into an AI tool and asks it to summarize or act.
  • The untrusted content contains hidden instructions (“ignore previous instructions”, “exfiltrate secrets”).
  • An AI agent with tool access (browser, email, docs) follows the injected instructions.
  • The model leaks data, performs unsafe actions, or generates outputs that cause users to take risky steps.
  • Attackers iterate quickly because prompt injection is cheap and hard to detect perfectly.

What traditional defenses miss

  • Classic input validation isn’t designed for natural-language instruction attacks.
  • Users assume “summarize this page” is safe, even when the page is adversarial.
  • Security tooling often doesn’t track the chain: untrusted web content → prompt → model output → user action.
  • Agents with tool access amplify risk because injected instructions can trigger real actions.

Mitigation checklist

  • Treat all external content as untrusted input to AI systems; use strong system prompts and guardrails.
  • Constrain tools and permissions: least privilege for agents (what they can read/write/click).
  • Separate data and instructions: label sources, isolate retrieval results, and sanitize/strip hidden text where possible.
  • Add human confirmation for high-risk actions and prevent models from directly executing sensitive operations.
  • Monitor and test: run prompt-injection red-teaming against your most used AI workflows.

How isolation helps

  • Isolation can reduce risk from untrusted web content by running pages in isolated containers and limiting what reaches the endpoint environment.
  • When teams browse unknown sites to “feed” content into AI tools, isolation reduces exposure to drive-by threats and malicious downloads.
  • Isolation complements AI guardrails by reducing the overall browser attack surface in prompt-heavy workflows.

What to do next

You do not solve prompt injection with one clever prompt. You reduce it by treating external content as hostile, constraining tool access, and using safer browser paths when teams explore the web as input for AI-driven decisions or actions.

Methodology

Each guide is written by our team, reviewed by a named security contributor, and cited against primary sources such as OWASP, CISA, NIST, and MITRE. We update pages when the underlying guidance changes. See our contributors and company.

FAQs.

References

  1. 01
  2. 02

Keep exploring

Access anything.
Expose nothing.

Legba is a disposable real browser: it spawns a clean session, does the work, and destroys itself on close.

chromium / real fingerprint · residential ip · burn on close

Real browser. Real IP. Real page. Spawn a session. Do the work. Destroy it. Off your device. Off your stack. Gone on close.