Towards Responsibly Non-Compliant Machines
This paper explores the challenge of engineering autonomous intelligent agents that can "responsibly" refuse user requests. While machines are designed to serve human needs, the authors argue that a system that always complies is not only limited in its intelligence but potentially unsafe. The research proposes a framework for machines to evaluate when to disobey, how to justify that refusal to the user, and under what conditions a human should be allowed to override that decision.
Why Machines Need to Say No
The authors identify that non-compliance is a necessary feature for safety and efficiency. A machine might need to refuse a command because it is physically impossible (e.g., an empty battery), unsafe (e.g., a robot avoiding a collision), or contrary to ethical and legal norms (e.g., refusing to send phishing emails). By moving beyond simple obedience, machines can act as more reliable partners that protect their users, the environment, and their own operational integrity.
The Life-Cycle of a Refusal
To move from blind obedience to responsible non-compliance, the authors suggest a structured process. When a user issues a command, the machine should not react immediately. Instead, it should enter a deliberation phase where it evaluates the request against safety, normative, and priority criteria. If the machine decides to refuse, it must provide a clear justification. This transparency allows the user to understand the machine’s reasoning and determine if the refusal is based on a misunderstanding or a valid constraint.
Managing Overrides and Liability
A critical component of this framework is the ability for a human to "override" a machine’s refusal. The authors categorize reasons for non-compliance based on whether they should be refutable. For instance, while a user might be allowed to override a refusal based on efficiency or ethical concerns, they should not be able to override a refusal based on environmental safety, where the risk to others is high. The authors emphasize that when a user forces a machine to comply against its "better judgment," the user must formally accept the liability for the potential consequences.
Engineering Future Agents
The paper outlines three potential architectures for implementing these systems:
Deliberate non-compliance: The machine is pre-programmed to refuse specific tasks or requests from certain users.
Predictable non-compliance: The machine follows a logical pipeline—checking feasibility, safety, norms, and efficiency—to decide when to disobey based on dynamic thresholds.
Learnt non-compliance: The machine observes its own interactions to identify patterns and contexts where refusal is the most appropriate course of action.
The authors conclude that while this is a preliminary sketch, the next step for the field is to integrate these concepts into a formal, robust architecture for autonomous agents.
Comments (0)
to join the discussion
No comments yet
Be the first to share your thoughts!