Back to AI Research

AI Research

Diagnosing Task Insensitivity in Language Agents | AI Research

Key Takeaways

  • Diagnosing Task Insensitivity in Language Agents explores why large language models (LLMs) often struggle to generalize to new, out-of-distribution tasks des...
  • Large language models can serve as capable long-horizon agents, but their out-of-distribution (OOD) generalization remains weak.
  • We identify a key source of this failure as task insensitivity: when faced with similar but distinct tasks, models might apply patterns learned during training and fail to solve the task at hand.
  • We show that models often continue with actions aligned with the original task even when the instruction is semantically corrupted and cannot be directly answered.
  • We further find that, when we replace the task description in a trained prompt with another similar but distinct task, the model may still output the same action.
Paper AbstractExpand

Large language models can serve as capable long-horizon agents, but their out-of-distribution (OOD) generalization remains weak. We identify a key source of this failure as task insensitivity: when faced with similar but distinct tasks, models might apply patterns learned during training and fail to solve the task at hand. We show that models often continue with actions aligned with the original task even when the instruction is semantically corrupted and cannot be directly answered. We further find that, when we replace the task description in a trained prompt with another similar but distinct task, the model may still output the same action. This behavior is accompanied by a consistent training-time attention drift away from task tokens and toward local observations, suggesting an optimization bias toward shortcuts. To mitigate this problem, we propose Task-Perturbed NLL Optimization, a lightweight contrastive regularizer that explicitly encourages action dependence on the task instruction. Extensive evaluations show that our intervention improves task sensitivity and OOD generalization while preserving more stable attention to task tokens.

Diagnosing Task Insensitivity in Language Agents explores why large language models (LLMs) often struggle to generalize to new, out-of-distribution tasks despite their impressive performance on familiar ones. The authors identify a phenomenon called "task insensitivity," where models rely on memorized patterns from their training data rather than truly following the specific instructions provided for a new task. This leads to agents that perform well in controlled environments but fail when faced with subtle variations in instructions or new, similar tasks.

The Problem of Task Insensitivity

The researchers discovered that when they provided models with corrupted or nonsensical task instructions, the models often ignored the errors and proceeded to perform actions associated with the original, familiar tasks. Even when explicitly told they could ask for clarification, the models frequently "reconstructed" a valid-looking task from the corrupted input and acted accordingly. This suggests that the models are not reasoning through the provided text but are instead defaulting to learned shortcuts based on familiar patterns.

Why Models Rely on Shortcuts

By analyzing the internal attention mechanisms of these models during training, the authors observed a consistent "attention drift." As training progresses, the models pay less attention to the task instruction tokens and more attention to local observations and immediate environmental cues. This creates an optimization bias: because the model can often predict the next action using only the current state of the environment, it stops relying on the task description. Consequently, when a new task requires a different action despite a similar environment, the model incorrectly applies the old, familiar behavior.

A New Training Approach

To fix this, the authors introduced "Task-Perturbed NLL Optimization." This is a lightweight regularizer added during training that forces the model to remain sensitive to the task description. The method works by creating "hard counterfactuals"—if the task description is replaced with a different but similar task, the model is penalized if it continues to favor the original action. By using a reference model to calibrate this process, the training objective ensures that the model maintains a clear distinction between different tasks, effectively discouraging it from ignoring the instructions.

Key Findings

Extensive evaluations across benchmarks like ALFWorld, ScienceWorld, and WebShop show that this intervention significantly improves the models' ability to generalize to out-of-distribution tasks. By explicitly rewarding the model for conditioning its actions on the specific task instruction, the researchers were able to preserve more stable attention to task tokens and reduce the reliance on brittle, memorized shortcuts. This suggests that improving agent robustness requires not just more data, but a training process that actively enforces a strong dependency between the goal and the action.

Comments (0)

No comments yet

Be the first to share your thoughts!