TRACE: A Temporal Conditional Estimation for Multimodal Time Series Foundation Models
Time series foundation models are designed to learn general patterns from data to help with various tasks, such as medical diagnosis or sentiment analysis. However, real-world data is often messy; different sensors or sources may record information at different times, or some information may be missing entirely. Current methods often try to "fill in the blanks" using simple techniques like repeating the last known value, which can distort the data and lead to poor performance. This paper introduces TRACE, a new approach that treats missing data as a variable to be estimated using the context provided by other available modalities, rather than just filling it with a static guess.
A Two-Stage Pipeline
TRACE organizes the modeling process into two distinct stages. First, it uses a "multimodal conditional diffusion" process. Instead of simply guessing missing values, the model looks at the data that is present in other modalities to understand the context. It uses a gating mechanism to decide which auxiliary data is most relevant, then uses a diffusion-based model to probabilistically estimate what the missing parts of the target modality should look like. Once the data is complete and coherent, the second stage uses a Mixture-of-Experts (MoE) fusion layer to combine these representations into a unified format that can be used for final predictions.
Leveraging Cross-Modal Context
The core innovation of TRACE is its ability to perform cross-modal conditional estimation. By using an MoE-inspired gating mechanism, the model assigns importance scores to different auxiliary modalities. This allows the system to prioritize reliable, informative data while ignoring noisy or irrelevant signals. By concatenating the observed parts of a target modality with this context-aware aggregation of other modalities, the model creates a rich, informed input for the diffusion process. This ensures that the estimated values are not just random guesses, but are grounded in the actual relationships between different types of data.
Improved Performance and Robustness
The researchers evaluated TRACE across several benchmarks, including the MIMIC-IV clinical dataset and the CMU-MOSI and CMU-MOSEI sentiment analysis datasets. Across these diverse tasks, TRACE consistently outperformed existing methods. By treating missing data as a latent variable to be estimated rather than a value to be filled, the model produces internal representations that are much closer to the "ground truth" of fully observed sequences. This leads to more reliable downstream predictions, particularly in scenarios where data is sparse or irregularly sampled, proving that temporal conditional estimation is a highly effective strategy for building robust foundation models.
Comments (0)
to join the discussion
No comments yet
Be the first to share your thoughts!