From GPS Points to Travel Patterns: Flexible and Semantic Trajectory Generation with LLMs
Urban trajectory data is essential for smart city planning and transportation research, but privacy concerns make accessing large-scale, high-quality datasets difficult. While existing methods attempt to synthesize realistic trajectory data, they often struggle to capture the complex, variable nature of human movement. This paper introduces HTP, a two-stage framework that uses Large Language Models (LLMs) to generate realistic, flexible, and privacy-preserving trajectory data by focusing on macro-level travel patterns rather than just individual GPS points.
A Hierarchical Approach to Generation
Traditional methods often generate trajectories by predicting individual GPS points directly, which fails to account for the "macro" patterns of movement, such as how traffic congestion or acceleration changes the density of sampled points. HTP solves this by splitting the process into two stages. First, it uses a specialized encoder to compress raw GPS data into "travel pattern tokens." These tokens represent common behaviors—like slowing down or speeding up—at a segment level. By generating these patterns first, the model captures the underlying logic of human movement before filling in the specific GPS coordinates.
Leveraging LLMs for Flexibility
A major limitation of previous models is their inability to handle diverse conditions, such as varying travel times, user preferences, or specific road constraints. HTP integrates these conditions by converting them into natural language descriptions. By extending an LLM’s vocabulary to include the travel pattern tokens created in the first stage, the model can "reason" about how a trajectory should look based on a text-based prompt. Because LLMs are inherently good at handling sequences of varying lengths, this approach allows HTP to generate trajectories that are not restricted to a fixed number of points, making the output much more realistic compared to older methods.
Improving Accuracy and Realism
To ensure the generated data is high-quality, HTP uses a technique called residual quantization. This allows the model to learn in a "coarse-to-fine" manner, where it first identifies broad movement patterns and then refines them with finer details. Additionally, the model incorporates road network information to ensure that generated paths are geographically plausible. Experiments on real-world datasets show that this hierarchical strategy is highly effective, outperforming the strongest existing baseline methods by an average of 29.78% in generation quality.
Key Takeaways
The HTP framework demonstrates that moving away from direct GPS point generation toward a pattern-based approach significantly improves the realism of synthetic mobility data. By combining the structural understanding of specialized encoders with the flexible, conditional reasoning of LLMs, the researchers have created a system that can produce diverse, variable-length trajectories that better reflect the complexities of real-world urban dynamics.
Comments (0)
to join the discussion
No comments yet
Be the first to share your thoughts!