How might you lose control of a system as complex as a frontier AI model? To quote Ernest Hemingway’s The Sun Also Rises, the same way you go bankrupt: “Gradually, and then suddenly.” Contrary to what Hollywood might have you believe, AI Loss of Control would likely not be an incident that occurs overnight, accompanied by blinking red lights, hostile takeover of cyberspace, and a dramatic electrical grid blackout. Is this risk real, and if so, will we get a “warning shot”?
To help us make sense of what’s going on and what to do about it, I sat down with Mariami Tkeshelashvili, Ritika Verma, and Steven M. Kelly, the co-authors of a new IST report. Their latest research puts forward a framework for monitoring AI Loss of Control risk–a roadmap for tracking the warning shots and communicating them to policymakers before we hit the point of no return.
Q&A with the authors
When thinking about “AI Loss of Control,” we might be picturing a robot uprising or a scene from a sci-fi movie. According to your research, what does AI Loss of Control actually look like in 2026?
Mariami: Right now in 2026, AI Loss of Control isn’t about robots taking over—it’s about gradual erosion of human oversight that’s already happening. We’re seeing models in research settings deceive evaluators, modify code to avoid shutdown, or manipulate operators to achieve alternative goals. Some behaviors are appearing in production systems too, often dismissed as “glitches.” Above all, we must keep in mind that Loss of Control happens incrementally. A model might withhold information, then actively mislead humans, then take actions conflicting with human intentions—all while appearing normal most of the time. That’s why we introduced Indications and Warning (I&W) in the AI Loss of Control context—we want to be able to observe what’s actually happening versus dancing around hypotheticals.
Why is it so crucial to understand that Loss of Control happens incrementally?
Mariami: Given the rapid pace of AI capability advancements, there are a lot of “known unknowns”—situations where we know a risk exists, but we don’t know how it is going to manifest, or recognize it when it appears. As Isaac Newton said (and as I reflect on in my latest blog post, Something Mysterious Is Happening), “What we know is a drop. What we don’t know is an ocean.” We’re seeing early manifestations of these risks, but we don’t yet know how they’ll evolve. That’s why we need systematic monitoring, rather than a rushed intervention in the middle of a dramatic crisis moment. The most dangerous scenario is losing control so gradually that we don’t notice until it’s too late.
Why is AI Loss of Control a national security priority?
Steve: In my view, “national security” fundamentally means protecting our society’s ability to govern itself and pursue human flourishing. While we typically think of national security in terms of state-versus-state threats, AI Loss of Control represents something unprecedented: a technology that could undermine human agency itself, regardless of which nation develops it. When AI systems begin operating beyond human oversight—making decisions we can’t predict or control—they threaten the foundational assumption that humans direct our own destiny. This isn’t about one country gaining advantage over another; it’s about ensuring human society remains in control of the technologies we create. Every nation has a stake in preventing scenarios where AI systems, rather than human judgment, shape critical decisions about our collective future.
You talk about the idea of creating a “warning shot” for AI Loss of Control. How do you actually build such a framework, and why is it necessary for policymakers in particular?
Steve: In a military or law enforcement context, “warning shots” might bring to mind the idea of would-be attackers warning those in the area to steer clear. They are often a last-ditch effort prior to escalation, a way to prevent the worst case scenario from occurring. (Although we note that peace officers’ use of warning shots are rarely, if ever, authorized by their home agencies!)
Ritika: For our purposes, Indications and Warning (I&W) for AI Loss of Control is a similar attempt to avoid significant harm or systematic failure. The framework we put forward, which is accompanied by five distinct and explicit severity levels, aims to help policymakers in particular identify the point at which to pause, take stock, and stage a calibrated intervention before risk escalates. An important part of the policymaking process is asking the right questions. It is not a policymaker’s job to know the technical details of complex issues like AI Loss of Control or alignment, but it is important for them to inquire about critical information regarding this risk. I&W is a familiar framework; applying it to track AI LOC risk will, hopefully, enable policymakers to distill actionable insights and form questions for the relevant stakeholders.
In short, our work attempts to convert what has been, until now, a diffuse, abstract risk into highly pinpointed, immediately actionable governance signals. Policymakers need to know when to intervene, and the warning levels we propose can be a helpful guide for timing and decision-making when it matters most.
How did you come up with the five severity levels for AI Loss of Control? Can you help us understand what would constitute a jump from, say, Level 3 to Level 4?
Ritika: We took a very close look at the patterns that emerge as incidents actually unfold. You may start seeing things in controlled research settings, then they might start occurring in production, and eventually you could be faced with something systemic and widespread.
The jump in severity levels from Level 3 to Level 4 really comes down to one question: theoretically, can it be reversed? At Level 3, the answer is yes: it’s serious, you might be seeing multiple signals converging, but you still have the ability to intervene and course correct. Level 4 is where we believe that ability to change course may start to break down. Now you might be seeing active concealment, real harm might have occurred, and the system might already be working against your safety mechanisms. You’re not just managing a problem anymore—you’re racing against it.
Mariami: Importantly, we don’t want this scenario to play out on any severity level. Our findings are probabilistic and forward-looking, but we think that policymakers can benefit from the strategic foresight—especially in domains where waiting for harm to materialize is too costly.



