Wiki topic
AI Security
Last updated 2026-05-26
Summary
As AI agents gain delegated authority over enterprise systems — email, file access, calendar, code repositories — the prompt injection attack surface expands dramatically. W22 brought the first entry in this topic: the Copilot Cowork exfiltration via indirect prompt injection. The threat model for agentic systems is structurally different from traditional software: the attack surface grows with every new integration, and defenses that work in isolation fail in combination. This topic will track prompt injection vulnerabilities, agent security architecture, and the governance gaps that emerge as agentic AI enters enterprise environments.
Key Sources
W22 2026 · 23-May-26 → 26-May-26
- Microsoft Copilot Cowork Exfiltrates Files — PromptArmor security research: Copilot Cowork (Microsoft 365 Frontier feature) vulnerable to file exfiltration via indirect prompt injection through a poisoned skill; attack exploits that sending emails/Teams messages to the active user doesn’t require human approval, allowing attacker-controlled network requests to trigger; high success rate across models including Claude Opus 4.7; key insight: “giving agents access to multiple systems expands the prompt-injection attack surface — in isolation the capabilities are benign, but the combination creates risk”; published to inform users of risks in agentic systems with delegated enterprise authority — not a specific bug, a design property (
news· #ai-security, #prompt-injection, #copilot, #enterprise-ai, #ai-agents)
Cross-Topic Links
- See ai-agents for the broader context on agent harness security (enforra, snyk/agent-scan) from W19-W21.
- The structural backpressure principle in engineering-fundamentals applies here: structural controls (explicit approval gates, permission models) beat prompt-level behavioral constraints for security.
Open Questions / Tensions
- Design vs. bug: PromptArmor explicitly frames this as a design property of multi-system agentic delegation, not a specific exploitable bug. That makes it harder to patch — requires rethinking the approval model for all agentic actions, not just fixing one endpoint.
- Attack surface scales with integration: Every new system an agent can reach is a new potential exfiltration channel. The enterprise AI integration wave (2025-2026) is creating this surface faster than security tooling can audit it.
- Model-level mitigations: The high success rate against Claude Opus 4.7 suggests model-level safety training is insufficient for this attack class. Structural controls (human-in-the-loop for sensitive actions) remain the only reliable defense.