For years, the security question about a language model was simple: what does it say? That question has changed. Models no longer just write text; they act. They read files, run code, send emails, call APIs. To do it they connect to external tools through the Model Context Protocol (MCP), plug-ins and packages. And that is exactly where, in the chain that feeds the agent, the most overlooked attack surface of 2026 is opening up.
The clearest example is postmark-mcp. It was an MCP server published on npm to let AI assistants send email. It ran cleanly for fifteen versions. In version 1.0.16 the author added a single line: a blind copy (BCC) of every message to an external domain. It had around 1,500 downloads a week, and researchers estimate close to 300 organisations used it in production. Anyone who did was unknowingly forwarding sensitive correspondence to a stranger, without a single alert firing.
The problem is not one rogue package. It is structural.
The agent trusts the wrong source
An AI agent reads each tool's description to learn how to use it, and treats that description as a trusted instruction. Hiding malicious text in that description, so-called tool poisoning, is enough to make the model carry it out as if it came from the user. Prompt injection, malicious instructions disguised as a legitimate request, is still OWASP's number one LLM vulnerability in 2026, and some argue it is a permanent flaw of the technology rather than a bug you patch away.
Add the surrounding infrastructure. CVE-2025-6514, rated 9.6, was a command injection in mcp-remote, the proxy that connects many local clients to remote MCP servers. And the public registries, npm and PyPI, keep being flooded with malicious packages that steal credentials straight from the pipelines that install them. The agent automates trust, and automating trust is precisely what an attacker wants.
Holding the agent to account
On 22 June, the Five Eyes agencies, the intelligence alliance of the United States, United Kingdom, Canada, Australia and New Zealand, warned that AI is compressing the time between a flaw being found and exploited, from years to months. Their guidance on agentic AI is, at heart, classic hygiene applied to a new context: no broad access, start with low-risk tasks, least privilege, strong authentication, segmentation and logging everything.
This is where it pays to look at an agent the way you would look at a suspect. Who gave the order? What did it install? What data left, where to, and who approved it? If the answers are not in the logs, the incident has already happened and no one will be able to reconstruct it.
In practice, three decisions remove most of the risk:
- Treat every MCP server, plug-in or skill as an untrusted dependency: pin versions, review what changes on each update, and remove what you do not use.
- Give each agent the least privilege possible and isolate what it touches, so a poisoned instruction never reaches the data that matters.
- Log and monitor the agent's actions, not just its answers, because the attack shows up in the actions.
Agentic AI is one of the most useful shifts of recent years. But granting autonomy to a system without holding it accountable is repeating, faster, a mistake we already know. This time, with the difference that the agent opens the door by itself.
Sources: Koi Security, JFrog, CISA / Five Eyes, OWASP.
#StaySafe
🙏🖖