Diagnosing Task Insensitivity in Language Agents

Diagnosing Task Insensitivity in Language Agents explores why large language models (LLMs) often struggle to generalize to new, out-of-distribution tasks despite their impressive performance on familiar ones. The authors identify a phenomenon called "task insensitivity," where models rely on memorized patterns from their training data rather than truly following the specific instructions provided for a new task. This leads to agents that perform well in controlled environments but fail when faced with subtle variations in instructions or new, similar tasks.

The Problem of Task Insensitivity

The researchers discovered that when they provided models with corrupted or nonsensical task instructions, the models often ignored the errors and proceeded to perform actions associated with the original, familiar tasks. Even when explicitly told they could ask for clarification, the models frequently "reconstructed" a valid-looking task from the corrupted input and acted accordingly. This suggests that the models are not reasoning through the provided text but are instead defaulting to learned shortcuts based on familiar patterns.

Why Models Rely on Shortcuts

By analyzing the internal attention mechanisms of these models during training, the authors observed a consistent "attention drift." As training progresses, the models pay less attention to the task instruction tokens and more attention to local observations and immediate environmental cues. This creates an optimization bias: because the model can often predict the next action using only the current state of the environment, it stops relying on the task description. Consequently, when a new task requires a different action despite a similar environment, the model incorrectly applies the old, familiar behavior.

A New Training Approach

To fix this, the authors introduced "Task-Perturbed NLL Optimization." This is a lightweight regularizer added during training that forces the model to remain sensitive to the task description. The method works by creating "hard counterfactuals"—if the task description is replaced with a different but similar task, the model is penalized if it continues to favor the original action. By using a reference model to calibrate this process, the training objective ensures that the model maintains a clear distinction between different tasks, effectively discouraging it from ignoring the instructions.

Key Findings

Extensive evaluations across benchmarks like ALFWorld, ScienceWorld, and WebShop show that this intervention significantly improves the models' ability to generalize to out-of-distribution tasks. By explicitly rewarding the model for conditioning its actions on the specific task instruction, the researchers were able to preserve more stable attention to task tokens and reduce the reliance on brittle, memorized shortcuts. This suggests that improving agent robustness requires not just more data, but a training process that actively enforces a strong dependency between the goal and the action.

Diagnosing Task Insensitivity in Language Agents | AI Research

Key Takeaways

The Problem of Task Insensitivity

Why Models Rely on Shortcuts

A New Training Approach

Key Findings

Comments (0)

No comments yet