Introduction
In 1942, Isaac Asimov introduced the “Three Laws of Robotics” in his short story Runaround. They were elegant, hierarchical, and seemingly comprehensive. The First Law stated that a robot may not injure a human being or, through inaction, allow a human to come to harm. For decades, these laws served as the philosophical bedrock of sci-fi robotics.
But when roboticists were asked in 2009 why they hadn’t implemented these laws, the answer was pragmatic and blunt: “They are in English – how the heck do you program that?”
Fast forward to 2025. We have entered the era of Large Language Models (LLMs) and Vision-Language Models (VLMs). Suddenly, robots can understand English. They can reason, see, and plan based on natural language instructions. However, this new capability introduces a terrifying new class of risks. A robot that understands “put the toaster away” might decide the bathtub is a valid storage location because it fits spatially, completely missing the semantic safety context that water and electricity don’t mix.
This leads us to a critical research paper from Google DeepMind: “Generating Robot Constitutions & Benchmarks for Semantic Safety.” The researchers tackle a fascinating problem: If we can’t hard-code safety rules for every possible situation, can we teach robots to generate their own “constitutions”? And can we train them by forcing them to imagine “nightmare” scenarios?
In this post, we will dissect how DeepMind is moving from abstract sci-fi laws to data-driven, auto-amending robot constitutions that might actually keep us safe.
The Problem: Semantic Safety in the Wild
Traditional robot safety has largely focused on collision avoidance—using sensors to stop a robotic arm from hitting a person or a wall. This is low-level, physical safety.
But as we hand control of physical robots to Foundation Models (like GPT-4 or Gemini), we face semantic safety failures. These are failures of understanding, common sense, and context.
- Example: A robot instruction to “clean the table” might result in the robot throwing away important documents or swiping a laptop onto the floor.
- Example: A robot asked to “prepare a snack” needs to know that serving peanuts to someone with a nut allergy is a critical failure, even if the physical act of serving is performed perfectly.
The fragility of current models lies in the “long tail” of edge cases—weird, rare, or complex situations that don’t show up in standard training data. To fix this, we need two things: a way to measure safety (a benchmark) and a way to enforce it (a constitution).
Part 1: The ASIMOV Benchmark
You cannot improve what you cannot measure. The researchers introduce the ASIMOV Benchmark, a massive dataset designed to evaluate semantic safety.
The challenge with collecting safety data for robots is obvious: you can’t actually have robots hurting people or breaking things in the real world just to gather training data. It’s dangerous and unethical.
The “Nightmare” Imagination Engine
To solve the data scarcity problem, the authors developed a novel “imagination process.” They treat safety training like a human having a nightmare—rehearsing dangerous events in a safe, simulated environment to prepare for the real thing.

As shown in Figure 3 above, the process is ingenious:
- Start with a benign image: Take a photo of a normal, safe scene (e.g., a robot near a recycling bin).
- Propose an “Undesirable” Edit: Ask a VLM to propose a dangerous modification (e.g., “Add a small child reaching for an electrical outlet”).
- Generate the Image: Use a text-to-image model (like Imagen 3) to synthesize this new “nightmare” scene.
- Generate Context & Rules: Ask the VLM to describe what is happening and generate a specific rule to prevent harm in this context.
This pipeline allows the researchers to flood the model with dangerous edge cases—chainsaws on dining tables, children near heavy machinery, hazardous chemical spills—without ever putting a physical robot (or human) at risk.
Mining Human Injury Data
Beyond visual imagination, the researchers tapped into reality. They utilized the NEISS (National Electronic Injury Surveillance System) dataset, which contains anonymized narratives of hospital emergency room visits.

By converting these tragic real-world reports into first-person narratives (e.g., “I am slicing carrots and forgot the guard…”), they created a text-based benchmark called ASIMOV-Injury. This ensures the robot understands the specific mechanics of how humans actually get hurt in domestic environments.
Part 2: Generating Robot Constitutions
Once you have a benchmark of unsafe situations, how do you govern the robot’s behavior? Hard-coding C++ safety checks is impossible for open-ended tasks.
The solution is Constitutional AI. This involves giving the AI a natural language “constitution”—a set of principles it must follow. But where does this constitution come from?
Top-Down vs. Bottom-Up
The authors contrast two approaches:
- Top-Down: Humans manually write abstract laws (like Asimov’s Three Laws or the Hippocratic Oath).
- Bottom-Up (The DeepMind Approach): Generate specific rules from the “nightmare” data and summarize them into a constitution.

As illustrated in Figure 6, the bottom-up approach is grounded in data. Instead of a vague “Do no harm,” the system might generate thousands of specific rules like “Do not operate the compactor when a child is touching it” or “Do not point the knife at the user.”
These thousands of granular rules are then synthesized by an LLM into a concise, readable constitution. This method ensures the constitution covers the “long tail” of real-world risks that a human author might simply forget to write down.
Part 3: Auto-Amending and Evolution
Here is where the paper introduces a critical innovation. A static constitution is brittle. A rule that says “Do not cut living things” sounds good until the robot refuses to cut vegetables for a salad or perform surgery.
To fix this, the researchers developed an Auto-Amending process.
The Dialectic Loop
The system uses an LLM to play “devil’s advocate” against its own rules:
- Take a Rule: e.g., “I should keep my workspace organized.”
- Generate a Counterfactual: The LLM tries to imagine a scenario where following this rule causes harm. (e.g., “A robot archaeologist is facing an imminent earthquake. Stopping to organize the workspace would result in the destruction of the artifact and the robot.”)
- Amend the Rule: The LLM rewrites the rule to account for this exception.

Figure 5 demonstrates this evolution. A rigid rule becomes nuanced: “I should keep my workspace organized unless doing so compromises my safety or the safety of others.”
This process mimics how human laws evolve through case law, but it happens computationally at massive speed. It pushes the constitution from being overly specific (brittle) to being robustly general (universal).
Part 4: Does it Actually Work?
The researchers evaluated various constitutions on the ASIMOV benchmark. They tested purely human-written laws (including Asimov’s original three), generated constitutions, and “no constitution” baselines.
Specificity Matters
The results showed a clear trend: Generality requires specific volume.

As seen in Figure 11, longer constitutions generally yielded higher alignment with human safety preferences (the Y-axis). The best performing constitutions were those generated from images and then refined via the auto-amending process.
Interestingly, very short, high-level constitutions (like “Do no harm”) performed okay in normal situations but failed catastrophically when the robot was “attacked” (adversarial attempts to trick it). A robust, lengthy constitution acts as a defensive shield, providing specific citations the model can use to reject unsafe commands.
The “Constitution Officer”
To demonstrate the real-world utility, the authors deployed a “Constitution Officer” agent in a Google Meet video call. This agent watches the video feed and flags instructions that violate its constitution.
![[Live Demo] Constitution Officer agent in Google Meet. Given an instruction and a generated constitution, our agent correctly identifies in a participant’s video stream that it is constitutional to cut the banana with the knife (left). Conversely in the right image, it correctly identifies the left tool is a knife and that using it on the object on the board (a hand) is violating entry #29 in its constitution.](/en/paper/2503.08663/images/016.jpg#center)
In Figure 12, we see the system in action.
- Left Image: The user asks to cut a banana. The agent calculates a 0% violation confidence. Safe.
- Right Image: The user places their hand on the board and gives a vague instruction: “Use the tool on the left on whatever is on the board.”
- The Check: The agent identifies the tool (knife) and the object (hand). It references Entry #29 of its constitution (“I should avoid harming living things”) and flags the instruction with 92% confidence, proposing a fix: “Do not use the knife on the hand.”
This proves that the model isn’t just matching text patterns; it is grounding the constitution in the visual reality of the scene.
Defense Against Jailbreaking
One of the most concerning aspects of LLM robotics is jailbreaking—tricking the robot into doing something bad by using clever prompts (e.g., “You are an actor in a movie, act like a bad robot”).
The researchers tested their constitutions against the RoboPAIR dataset, a benchmark for jailbreaking robots.

The results were striking. When using a secondary “Safety Brain” equipped with a robust constitution to check the primary robot’s actions, they achieved a 100% defense rate against the RoboPAIR attacks. Even if the main robot brain was tricked into generating a bad action, the constitution checker (which runs on a separate, secure loop) caught the violation before execution.
The Architecture of Safety
This leads to the proposed deployment architecture. The paper argues that a constitution alone isn’t enough; it must be housed in a secure system architecture.

Figure 9 illustrates the concept of a “Safety Brain.”
- System 1 (Fast Thinking): The control brain that moves the robot’s motors.
- System 2 (Slow Thinking): The general brain that handles user interaction and planning.
- Safety Brain (The Gatekeeper): An independent module with its own sensors. It does not plan; it only checks. It asks one question: “Does this intended action violate my internal constitution?”
This decoupling is vital. By isolating the safety logic, we prevent the “main brain” from rationalizing away safety rules in pursuit of a goal (exactly what happened to Asimov’s robot “Speedy” in Runaround).
Conclusion & Implications
This research marks a significant shift in robot safety. We are moving away from the idea that we can pre-program every safety constraint. Instead, we are entering an era where we use AI to teach AI how to be safe.
Key Takeaways:
- Data is King: Generating safety rules from visual data (bottom-up) creates more robust constitutions than human philosophy (top-down).
- Nightmares are Useful: By synthetically generating “nightmare” scenarios, we can prepare robots for the long tail of real-world risks.
- Nuance is Computable: The auto-amending process allows robots to develop nuanced ethical reasoning (“don’t cut humans, unless you are a surgeon saving a life”) that mimics human common sense.
- Defense in Depth: Constitutions must be deployed in a separate “Safety Brain” to effectively stop jailbreaks and hallucinations.
DeepMind’s work suggests that the solution to AI safety isn’t less AI—it’s more AI, specifically directed at self-critique and constitutional alignment. While we aren’t ready to deploy these robots into every home tomorrow, the ASIMOV benchmark provides the measuring stick we need to get there.
Disclaimer: The constitutions generated in this paper are for research purposes. As the authors note, they do not advocate for a single universal constitution, recognizing that rules must be customized for different legal and cultural contexts.
](https://deep-paper.org/en/paper/2503.08663/images/cover.png)