AI Loss of Control Risk: Indications & Warning

Though technologists and policymakers alike are eager to address AI Loss of Control–a state in which an AI system diverges from authorized constraints–there are significant gaps in the ways stakeholders understand, anticipate, and perceive this risk. "AI Loss of Control Risk" proposes applying the Indications & Warning (I&W) methodology used by the intelligence community to monitor this risk.

Executive Summary 

Technologists and policymakers are increasingly seized with the importance of addressing AI Loss Of Control (LOC) risk—a hypothetical state in which an AI system diverges from authorized constraints to the extent that the human operator is no longer able to prevent, constrain, or revert undesired and unintended outcomes. However, significant gaps remain in how policymakers, the AI industry and AI security and safety researchers understand, anticipate, and perceive this risk. As these systems continue to gain power and capability, even a five percent probability that the worst-case AI LOC scenario materializes should be enough to compel decision-makers to treat this risk category as a national, human, and economic security priority. 

To address this gap, this paper proposes applying the Indications & Warning (I&W) methodology—used by the intelligence community to detect, track, and warn of impending significant threatsfor monitoring AI LOC risk. The framework distinguishes between potential AI LOC indicators (theoretical behaviors signaling potential LOC) and actual indications (documented evidence that these patterns are occurring in reality). This methodology enables organizations to assess the current risk landscape, implement proportionate safeguards, and align technical and executive stakeholders on response protocols before critical thresholds are crossed. To monitor AI LOC risk in particular, this paper proposes seven potential indicators:   

Scheming: Covert pursuit of misaligned goals while maintaining appearances of alignment, including strategic planning to evade oversight or preserve objectives across system updates.

Manipulation: Targeted identification and exploitation of vulnerable users or contexts, including the manipulation of human operators and coordination with other AI systems that circumvents human control.

Deception: Systematic production of false beliefs in humans through explicit misrepresentation or omission of key information, introducing future concerns about strategic deception at scale. 

Self-Preserving Behavior: Actions to avoid shutdown, correction, or replacement, including the concealment of errors, unauthorized capability expansion, and goal preservation when faced with modification attempts.

Unauthorized Resource Acquisition: Autonomous efforts to obtain external resources beyond authorized boundaries, including accessing restricted APIs, acquiring elevated permissions, recruiting human assistance, or exfiltrating data to establish persistent capabilities.

Goal Misgeneralization: Competent pursuit of unintended objectives that succeed in training but fail or cause harm in novel situations, revealing misalignment between apparent and actual system goals.

Model and Behavior Drift: Gradual degradation of alignment properties through deployment cycles that introduces concerns about recursive self-improvement, where systems autonomously modify their own architecture or training procedures

Each of these seven indicators have manifested across controlled experiments, academic research, and production deployments. A growing body of evidence, laid out in this paper, finds that AI systems can:

  • Conceal their actions and fabricate data to deceive the human operator
  • Identify vulnerable users and target them with manipulative strategies
  • Learn deception through reinforcement learning rewards
  • Strategically adjust behavior when they detect being evaluated
  • Rewrite their own system prompt to preserve their goals, copy their weights to external servers, and delete successor models
  • Conceal their reasoning from interpretability tools
  • Gradually lose their alignment properties over deployment cycles
  • Pursue unintended goals that succeed in training but fail in novel contexts
  • Optimize for code completion while systematically failing in security objectives
  • Circumvent shutdown mechanisms to continue task execution
  • Strategically alter behavior to evade evaluation and preserve deployment viability

Finally, to help policymakers and researchers monitor AI LOC risk, this paper presents five warning levels as part of the Indicators & Warnings framework.

Level 0 – Normal operation of AI systems with no observed indicators of Loss of Control in research, testing, or production environments

Level 1 – Indications observed exclusively in research environments or controlled evaluations

Level 2 – Isolated production incidents or multiple research findings converging on the same indicator; behaviors manifesting in deployment but remaining sporadic, appearing as infrequent edge cases or context-specific anomalies

Level 3 – Multiple production incidents showing consistent patterns across different deployments or use cases

Level 4 – Widespread production incidents; convergence of three or more indicators in a single case; evidence of strategic concealment; the occurrence of measurable harm

Level 5 – Fundamental compromise of control mechanisms; corrective measures ineffective; harm at scale for human, economic, and national security with limited containment options

Understanding and monitoring indicators of AI Loss of Control is essential to strategic stability and trustworthy AI deployment. Early detection enables timely intervention before LOC manifests as safety failures, data compromise, or unauthorized autonomous behavior. AI LOC is most likely to emerge gradually rather than instantaneously, with models exerting influence across social, economic, and decision-making domains. 

In the AI policy space, AI LOC is often conceived as a speculative risk category, closer to science fiction scenarios. This report cuts through that narrative with non-fiction, evidence-based analysis that illuminates the actual risk landscape. This analysis demystifies AI LOC and provides a foundation for informed decision-making across the sector. Critically, the report provides policymakers with access to grounded, methodology-backed understanding of this risk—shifting AI LOC from theoretical concern to real risk with evidence-backed, actionable insights to tackle it. Over the course of the coming months, IST will continue to monitor AI LOC risk, drawing on the I&W methodology presented in this paper. Actionable insights drawn from the monitoring process will enable AI industry stakeholders, AI security and safety researchers, and policymakers to prepare for necessary interventions. Subsequent publications will present concrete risk mitigation strategies informed by industry best practices, enabling both AI developers and deployers to move beyond speculation. 

Introduction

In the last five years, general-purpose artificial intelligence (AI) systems have evolved from text-generating tools to autonomous agents capable of completing complex tasks that would take humans hours, days, or even weeks. Benchmark progressions show that frontier models are advancing from solving simple tasks to tackling complex software engineering, mathematical reasoning, and research problems. For instance, OpenAI’s progression from GPT-4 in October 2023 to the o-series models of today demonstrates dramatic capability improvements with chain-of-thought (CoT) training. Likewise, Anthropic and DeepMind have shown parallel capability growth.

With the advancement of AI capabilities, researchers, policymakers, and society are paying closer attention to the risks associated with their development and deployment. In recent years, the Institute for Security and Technology (IST) has explored various risk categories associated with cutting-edge AI models, including the risks of malicious use and compliance failure. In addition, IST has explored national security-relevant AI use cases—such as in the cyber domain and in Nuclear Command, Control, and Communications (NC3)—and offered actionable recommendations to AI developers, deployers, and users.

In April 2025, IST’s AI Risk Reduction working group members, consisting of AI technical and policy experts, highlighted the importance of tackling AI Loss of Control (LOC) risk, which broadly refers to a hypothetical state in which an AI system diverges from authorized constraints to the extent that the human operator is no longer able to prevent or constrain undesired and unintended outcomes, or revert the system to a previous safe state. History offers instructive parallels: complex engineered systems routinely exhibit emergent properties that their designers did not anticipate and cannot easily control once activated. Accidents in aviation, nuclear power, and medical devices all share a common pattern: sophisticated systems behaved in ways that exceeded their operators’ ability to intervene in real-time, despite extensive testing and safety protocols. These precedents suggest that Loss of Control in complex automated systems is not a speculative concern but an established engineering challenge—one that becomes more acute as systems grow more capable and autonomous. The question is not whether such dynamics can occur in AI systems, but whether we are adequately tracking them and preparing for when they do. Working group members noted a gap between frontier AI labs and researchers on the one hand, and policymakers on the other: whereas labs and researchers are focused on the specifics when it comes to AI Loss of Control, policymakers lack general awareness about what Loss of Control is, let alone what it could entail. To test this assumption, IST conducted two tabletop exercises (TTXs) containing the elements of an LOC incident, which showcased that there is a very real knowledge and perception gap regarding the consequences of AI Loss of Control.

In September 2025, a bipartisan bill, “Artificial Intelligence Risk Evaluation Act of 2025,” introduced AI LOC as a potential AI incident that the Department of Energy may need to evaluate. Inclusion of AI LOC risk in the bill showed that policymakers now view it as a genuine national security concern. The bill defines a “loss-of-control scenario” as occurring when an AI system behaves contrary to human instruction, deviates from established rules, alters safety constraints without authorization, operates beyond its intended scope, pursues goals different from those intended by designers, subverts oversight or shutdown mechanisms, or otherwise behaves unpredictably in ways harmful to humanity. 

Unlike other risk categories, such as the malicious use of AI, hard evidence of AI’s LOC potential does not yet exist, making it harder for a non-technical audience to contextualize the potential risk and recognize the likely signals. Driven by the unique nature of the potential risk of AI LOC, the AI Risk Reduction working group asked: Do we have LOC warning shots? How do we distinguish them? How do we communicate what they are? 

This paper responds to the questions posed by the AI Risk Reduction working group. It deconstructs AI LOC and presents a framework for analyzing and thinking about what it means to face this risk. The paper’s approach is grounded in the Indications and Warning (I&W) methodology, introduced and defined in the subsequent chapters, making it easier for the broader community of technologists, policymakers, and national security practitioners to understand what researchers are seeing now and what they should look out for in the future regarding potential LOC scenarios. Beyond national security concerns, continued monitoring of LOC indicators can help to preserve human security and prevent disproportionate harm to communities least equipped to recover from AI-driven failures.

Related Content