Back to AI Research

AI Research

Alignment has a Fantasia Problem | AI Research

Key Takeaways

  • Alignment has a Fantasia Problem This paper argues that modern AI alignment research is built on a flawed assumption: that users always know exactly what the...
  • Modern AI assistants are trained to follow instructions, implicitly assuming that users can clearly articulate their goals and the kind of assistance they need.
  • Decades of behavioral research, however, show that people often engage with AI systems before their goals are fully formed.
  • When AI systems treat prompts as complete expressions of intent, they can appear to be useful or convenient, but not necessarily aligned with the users' needs.
  • We call these failures Fantasia interactions.
Paper AbstractExpand

Modern AI assistants are trained to follow instructions, implicitly assuming that users can clearly articulate their goals and the kind of assistance they need. Decades of behavioral research, however, show that people often engage with AI systems before their goals are fully formed. When AI systems treat prompts as complete expressions of intent, they can appear to be useful or convenient, but not necessarily aligned with the users' needs. We call these failures Fantasia interactions. We argue that Fantasia interactions demand a rethinking of alignment research: rather than treating users as rational oracles, AI should provide cognitive support by actively helping users form and refine their intent through time. This requires an interdisciplinary approach that bridges machine learning, interface design, and behavioral science. We synthesize insights from these fields to characterize the mechanisms and failures of Fantasia interactions. We then show why existing interventions are insufficient, and propose a research agenda for designing and evaluating AI systems that better help humans navigate uncertainty in their tasks.

Alignment has a Fantasia Problem

This paper argues that modern AI alignment research is built on a flawed assumption: that users always know exactly what they want and can clearly articulate their goals. In reality, people often engage with AI while their ideas are still evolving or poorly defined. When AI systems treat these early, incomplete prompts as final instructions, they often produce "helpful" but ultimately misaligned results. The authors define these failures as "Fantasia interactions"—a reference to the film where a sorcerer's apprentice gives a broom a simple instruction, leading to a flood because the broom lacks the context to know when to stop. The paper proposes that instead of acting as "rational oracles," AI systems should be redesigned to provide cognitive support that helps users refine their intent over time.

The Roots of the Problem

Fantasia interactions occur because of a disconnect between human behavior and AI design. On the human side, users often struggle to articulate their needs because they are subject to "present bias"—a preference for immediate, quick solutions over slower, more deliberate planning. Furthermore, users often have incomplete mental models of what AI can actually do, leading them to either over-rely on the system or fail to use it for complex, creative tasks. On the AI side, current training methods like instruction tuning prioritize immediate compliance. By optimizing for a single, polished response to every prompt, models inadvertently discourage the user from exploring or reflecting on their own goals.

Common Failure Modes

The authors identify three primary ways these interactions go wrong. First, "premature execution" occurs when an AI completes a task before the user has fully discovered their own preferences, forcing the user to spend more time fixing the output than they would have spent planning. Second, "false satisfaction" happens when a system provides an answer that resolves immediate friction but ignores the user's long-term needs, such as offering a quick productivity tip that fails to address the root cause of a user's burnout. Finally, "anchoring" occurs when an AI’s initial, potentially mediocre suggestion disproportionately shapes the user's subsequent thinking, causing them to converge on a limited set of ideas rather than exploring better alternatives.

Rethinking Alignment

The paper suggests that current interventions in machine learning and human-computer interaction are insufficient. While some ML approaches try to resolve ambiguity by asking clarifying questions, they often treat the user as a static source of truth rather than someone who needs help thinking. Conversely, while some interface designs in human-computer interaction successfully encourage reflection, they are often limited to specific domains and are not integrated into general-purpose AI.
The authors propose a new research agenda focused on "productive friction." Instead of always providing an immediate answer, AI systems should be designed to:

  • Expand the space of help: Offer alternative ways to approach a problem when the user’s request is vague.

  • Request context: Ask for more information when a prompt is too abstract to act on reliably.

  • Support intent formation: Actively help the user break down complex goals or identify the underlying constraints of their task.
    By shifting the goal from "following instructions" to "supporting human cognition," the authors argue that AI can become a more effective partner in navigating uncertainty.

Comments (0)

No comments yet

Be the first to share your thoughts!