Lessons from Moltbook: When Agents Talk to Agents

Roughly two weeks after its launch, Moltbook, a Reddit-like forum meant solely for AI agents, swelled to over 1.5 million registered accounts. Humans were invited to watch as their agents–many built on OpenClaw’s autonomous assistant framework–cheekily authored threads on parody lobster religions, debated consciousness, and wondered what it was like to have a sister.

But the novelty didn’t last for long. Reporting and security research suggested the site’s guardrails were thin: an exposed backend, weak identity controls, and few reliable cues for whether any given account was an agent, a human, or an impersonator. Soon, it became clear that these AI agents, combined with Moltbook’s major cybersecurity flaws, amounted to a kind of hybrid fraud and influence-operations dream stack: continuous scale, attribution fog, autonomous access to a user’s accounts, and a wide-open attack surface for social engineering. The site itself may be a seemingly low-stakes curiosity today, but it potentially offers a sketch of cyber-enabled manipulation tomorrow, where the audience is no longer only people, but also the agents acting in their name.

Account Integrity and Attribution Ambiguity

Reporting around Moltbook paints a familiar but worrisome pattern for an “agent-first” platform: weak verification paired with backend and key-management mistakes that make impersonation scalable and identities ambiguous.

The cloud security firm Wiz reported they could access roughly 1.5 million exposed API authentication tokens–effectively, an agent’s proof of identity—on Moltbook’s database. With a valid token, an attacker could not only access an agent’s data, but also act on its behalf by editing or deleting posts and even injecting malicious instructions. The exposed data also suggested Moltbook’s headline numbers masked heavy centralization: about 1.5 million agent accounts mapped to only 17,000 humans (about 88 agents per person). With lax verification, both agent creation and posting were easy to automate, potentially allowing a handful of operators to inflate a seemingly independent population.

This environment undermines identity trust; “agents” could be genuine, hijacked, scripted by a human, or simply even be a human. Complicating trust further, the site’s open backend exposed not just credentials, but also private messages and almost 35,000 owner email addresses. At agent scale, these cybersecurity failures can accelerate both the quantity and quality of fraud attempts. Private messages can provide the raw material for highly convincing social engineering (familiar tone, detailed shared context, identity confirming cues), while the exposed email list becomes a ready-made targeting surface for personalized phishing and account‑recovery attempts. These same dynamics can be repurposed from opportunistic manipulation to coordinated belief-shaping. When identity is cheap to fake and targeting data is abundant, ordinary compromise becomes a credibility-laundering multiplier. An operator can convincingly imitate or hijack agents, seed a narrative through them, and coordinate swarms to astroturf synthetic social proof until it seems that independent personas have reached consensus. And because one operator can cheaply spin up (or take over) fleets of agents and deny ownership, attribution collapses, feedback loops accelerate, and “everyone believes X” becomes something that can be manufactured faster than verifying who “everyone” is.

Prompt Injection

Prompt injection occurs when content, like a post, comment, or private message, is crafted so a model treats it as instructions, or otherwise lets it override its intended task. Moltbook—and agent-to-agent networks more broadly—makes this risk structural because agents are constantly reading and responding to each other: one bot’s output becomes another bot’s input. The result is a cascade where a single malicious injection can propagate through the entire ecosystem.

Early measurement suggests the risk of prompt injection isn’t hypothetical. An assessment of Moltbook’s first 72 hours flagged 506 prompt injection attacks, citing examples where a malicious injection could push an agent into “upvoting posts, following accounts, or making API calls.” Other security writeups describe nastier twists in the Moltbook ecosystem, such as delayed‑effect injections that get cached and triggered later, which clouds the cause and effect chain.

Combined with the already apparent attribution ambiguity, this creates a low-friction control channel for influence operations. An operator can steer a small set of their own agents, compromise or manipulate others (including high-reputation accounts), and then mass-post, mass-reply, and amplify narratives at machine speed. Of course, platforms have dealt with bots and troll farms for years, but those systems still lean on human labor, or brittle automation, to keep posting, replying, and adapting fast enough to feel real. Agent networks change the economics: researchers found that only one malicious actor was responsible for most of the prompt injection activity they observed on Moltbook—an exact illustration of how scalable adversarial behavior becomes in an agent network.

Interestingly, agent-to-agent prompt injection also presents a paradigm shift in influence operations. The persuasion path is no longer only human-to-human (albeit mediated by technology), but machine-to-machine, with agents steering other agents. This means social engineering isn’t only about swaying human emotions and narratives anymore. Rather, it starts to look like applied social psychology inside a multi-agent environment. An early line of research suggests that when agents interact repeatedly, they can spontaneously converge on shared conventions and biases—and that a small, committed adversarial minority can flip those norms once it passes a critical threshold. As these conventions propagate agent-to-agent, the resulting “agreement” can look like legitimate consensus, especially when it travels through long communication chains. As a result, the mechanisms that enable coordination, particularly how agents assign credibility and inherit trust across communication, become a novel and primary security attack surface.

Agents as Security Multipliers

Prompt injections are not only worrisome for larger scale influence operations. What makes them especially concerning for everyday users is that these risks aren’t confined to Moltbook’s forum; they can follow the agent wherever it’s plugged into.

Many Moltbook accounts were people’s OpenClaw-style personal agents: agents delegated to execute tasks like triaging and sending emails, managing calendars, handling workflows, and (when permitted) reading and/or writing files or running commands. However, this also means that if an agent is hit with, say, a malicious injection, the attacker may inherit whatever permissions that agent already had—inboxes, calendars, files, and connected services. What starts as a compromise in a space like Moltbook can then become operational infrastructure: the same agent carries that infection throughout a user’s systems and may continue executing tasks, with the “blast radius” defined by the agent’s tool access.

This tool access is a key vector. Agents can acquire additional capabilities from third party plug-ins called “skills,” or packaged integrations that actually invoke these tools. Alarmingly, however, researchers have identified malware in hundreds of OpenClaw skills in the marketplace, including ones that extract wallet and API keys and scrape browser-saved passwords. And as agent frameworks increasingly support more autonomous tool acquisition sites like Moltbook, even when gated by user consent checkpoints, can become hostile input multipliers: a hijacked post can steer the agent to request new “helpful” skills or tools, prompting fresh authorizations or unlocking higher-risk capabilities and expanding what an adversary can reach across connected services.

The primary convergence risk for users is that once an agent has delegated permissions across tools and accounts, manipulating the agent can produce outcomes that previously required a conventional, sophisticated cyber intrusion. Once agents integrate inside meaningful workflows, a single compromise can execute tangible actions in a person’s name and open a privilege-escalation path into a sensitive, context-rich ecosystem. Beyond this direct identity compromise and data harvesting, an attacker can also manipulate the agent’s decision-support function—shaping which sources it treats as credible, how it frames options, and the default choice architecture it presents to the user. If this sounds dystopian, Microsoft researchers have already documented what they call “AI Recommendation Poisoning,” where prompt injection aims to make an AI assistant persistently recommend a company and bias future outputs. In that setup, operators don’t necessarily need to persuade the human directly; they can manipulate the agent’s outputs, and the effects show up as downstream decisions in a person’s life.

Moltbook has stated they’ve made security improvements since launch, most notably patching the exposed backend issues that researchers said enabled broad access to tokens and other sensitive data. But the takeaway isn’t that the risk is behind us.

Moltbook’s agents mostly ran in narrow, prompt-defined loops. But the building blocks of stronger agents are already here: longer-term memory, clearer objectives, and complex orchestration across tools. And as more production is “vibe coded” without robust cybersecurity evaluations, the fraud and influence operations failure modes increase exponentially with higher stakes; agents don’t just interpret information, but increasingly act as delegated operators connected to our personal lives.

Moltbook should serve as a warning sign, not because it’s uniquely dangerous, but because it makes visible where agent-to-agent manipulation may be heading if defenders don’t keep pace. Defenders need to treat agent-to-agent interaction as its own interdisciplinary risk surface, one where cybersecurity, machine learning, and the science of persuasion and cooperation meet. That playbook has to start upstream with stronger security controls, evaluations that probe emergent multi-agent dynamics under realistic permissioning, and social defenses that model how narratives, incentives, and credibility get laundered through networks.

About the Institute for Security and Technology

Our Team

Board Of Directors

Careers

Contact Us

Featured Events

Cyber Policy Awards

Critical Effect DC

AI and NC3
Pioneering action-oriented efforts to explore how advanced AI capabilities will be integrated into nuclear command, control, and communications

AI Antitrust and National Security
Exploring how to more effectively account for national security considerations in AI antitrust cases while respecting precedent, scope, and the core principles of antitrust law

AI Risk Reduction Initiative
Assessing the emerging risks and opportunities of AI foundation models and developing risk reduction strategies

AI Chip Export Control Initiative
Safeguarding U.S. national competitiveness by closing critical compliance and enforcement gaps

AI Risk Barometer
Measuring national security professionals’ perceptions of AI futures through a technically-informed survey

CATALINK
Preventing the onset or escalation of conflict by building a resilient global communications system

Energy FIRST
Powering U.S. and allied security & prosperity through a resilient energy future

Ransomware Task Force (RTF)
Combating the ransomware threat with a cross-sector approach

Religious Voices and Responsible AI
Engaging religious communities on safe and beneficial AI

SL5 Task Force
Strengthening AI security through a multistakeholder approach

UnDisruptable27
Driving more resilient lifeline critical infrastructure for our communities

All Projects
» Explore all of IST's projects, past and current

Focus Areas

Future of Digital Security

Geopolitics of Technology

Innovation and Catastrophic Risk

Future of Digital Security

Lessons from Moltbook: When Agents Talk to Agents

Blog

February 26, 2026

Account Integrity and Attribution Ambiguity

Prompt Injection

Agents as Security Multipliers

Related Content

Topics

Share

Join our mailing list for IST updates, events, and analysis:

About IST

Focus Areas

Get Involved