Bridging the Last Mile of Time Series Forecasting w...

Bridging the Last Mile of Time Series Forecasting with LLM Agents
In real-world business, a statistical forecast is rarely ready to be used immediately. While foundation models are excellent at predicting numerical trends based on historical data, they often lack the "business context"—such as holiday schedules, marketing campaigns, or unexpected external events—that human planners use to adjust predictions. This paper introduces a framework that treats this final stage, called "last-mile forecasting," as a structured, auditable, and agent-driven process. By placing an LLM agent on top of a standard forecasting model, the system allows for context-aware revisions that are transparent and easy to track.

A Unified Workspace for Forecast Revision

The core of this framework is a "forecast workspace." Instead of asking an AI to simply generate a new forecast from scratch, the system maintains a shared state that includes the original historical data, the immutable baseline forecast, and an editable version of the forecast. This separation ensures that the agent cannot accidentally overwrite the baseline or corrupt the historical record. By keeping these elements in one place, the agent can compare its proposed changes against the original statistical prediction, ensuring that every adjustment is intentional and grounded in specific evidence.

Constrained Actions and Evidence-Based Reasoning

To ensure the system remains controllable and reliable, the agent is restricted to a specific set of "revision actions." Rather than outputting free-form text, the agent uses tools to retrieve evidence—such as calendar events or historical analogs—and then applies precise edits, such as adjusting a specific date range or overriding a point in time. Every action taken by the agent is recorded in a revision trace. This creates an audit trail, allowing human planners to see exactly why a change was made, what evidence supported it, and how the final forecast differs from the initial statistical baseline.

Handling Long Horizons and Self-Improvement

For long-term forecasting, the framework uses a "map-reduce" approach. It decomposes a long timeline into smaller, manageable event windows. A local reasoner examines each window individually, proposes specific revisions, and then aggregates these into the final forecast. Furthermore, the system includes a memory bank for post-hoc reflection. Once actual data becomes available, the system compares its revised forecast to the real-world outcome. It stores these lessons as structured experiences, allowing the agent to improve its future calibration and decision-making without needing to retrain the underlying forecasting model.

Real-World Performance

Case studies using air travel ticket data demonstrate that this agentic approach significantly outperforms standard statistical models during complex periods like holidays. By applying targeted, evidence-backed revisions, the framework drastically reduces error rates during high-impact events while maintaining accuracy across the rest of the forecast horizon. The results show that by bridging the gap between raw statistical output and business-ready planning, the system provides a more reliable and transparent tool for operational decision-making.

Bridging the Last Mile of Time Series Forecasting w... | AI Research

Key Takeaways

A Unified Workspace for Forecast Revision

Constrained Actions and Evidence-Based Reasoning

Handling Long Horizons and Self-Improvement

Real-World Performance

Comments (0)

No comments yet