TRACE: A Temporal Conditional Estimation for Multim...

TRACE: A Temporal Conditional Estimation for Multim... | AI Research

Key Takeaways

TRACE: A Temporal Conditional Estimation for Multimodal Time Series Foundation Models Time series foundation models are designed to learn general patterns fr...
Time series foundation models (TS-FMs) aim to learn generalizable temporal representations that can be adapted to a wide range of downstream tasks.
Existing approaches typically rely on naive imputation or masking strategies, which fail to account for cross-modal dependencies and often lead to misaligned or degraded representations.
We evaluate TRACE on diverse multimodal benchmarks spanning healthcare and affective computing, including the MIMIC-IV clinical dataset and the CMU-MOSI and CMU-MOSEI benchmarks for multimodal sentiment analysis.
TRACE: A Temporal Conditional Estimation for Multimodal Time Series Foundation Models

Paper AbstractExpand

Time series foundation models (TS-FMs) aim to learn generalizable temporal representations that can be adapted to a wide range of downstream tasks. In real-world multimodal settings, time series are frequently affected by temporal misalignment and partial modality missingness, where different modalities are observed at heterogeneous time scales or are partially absent. Existing approaches typically rely on naive imputation or masking strategies, which fail to account for cross-modal dependencies and often lead to misaligned or degraded representations. We propose TRACE, a conditional estimation paradigm for multimodal time series foundation model pipelines under missingness and irregular sampling, allowing incomplete target modalities to be systematically inferred from available auxiliary modalities. We evaluate TRACE on diverse multimodal benchmarks spanning healthcare and affective computing, including the MIMIC-IV clinical dataset and the CMU-MOSI and CMU-MOSEI benchmarks for multimodal sentiment analysis. Across a range of downstream prediction tasks and missing-modality settings, TRACE consistently outperforms prior multimodal fusion approaches, demonstrating improved robustness to severe modality missingness and more reliable cross-modal representations.

TRACE: A Temporal Conditional Estimation for Multimodal Time Series Foundation Models
Time series foundation models are designed to learn general patterns from data to help with various tasks, such as medical diagnosis or sentiment analysis. However, real-world data is often messy; different sensors or sources may record information at different times, or some information may be missing entirely. Current methods often try to "fill in the blanks" using simple techniques like repeating the last known value, which can distort the data and lead to poor performance. This paper introduces TRACE, a new approach that treats missing data as a variable to be estimated using the context provided by other available modalities, rather than just filling it with a static guess.

A Two-Stage Pipeline

TRACE organizes the modeling process into two distinct stages. First, it uses a "multimodal conditional diffusion" process. Instead of simply guessing missing values, the model looks at the data that is present in other modalities to understand the context. It uses a gating mechanism to decide which auxiliary data is most relevant, then uses a diffusion-based model to probabilistically estimate what the missing parts of the target modality should look like. Once the data is complete and coherent, the second stage uses a Mixture-of-Experts (MoE) fusion layer to combine these representations into a unified format that can be used for final predictions.

Leveraging Cross-Modal Context

The core innovation of TRACE is its ability to perform cross-modal conditional estimation. By using an MoE-inspired gating mechanism, the model assigns importance scores to different auxiliary modalities. This allows the system to prioritize reliable, informative data while ignoring noisy or irrelevant signals. By concatenating the observed parts of a target modality with this context-aware aggregation of other modalities, the model creates a rich, informed input for the diffusion process. This ensures that the estimated values are not just random guesses, but are grounded in the actual relationships between different types of data.

Improved Performance and Robustness

The researchers evaluated TRACE across several benchmarks, including the MIMIC-IV clinical dataset and the CMU-MOSI and CMU-MOSEI sentiment analysis datasets. Across these diverse tasks, TRACE consistently outperformed existing methods. By treating missing data as a latent variable to be estimated rather than a value to be filled, the model produces internal representations that are much closer to the "ground truth" of fully observed sequences. This leads to more reliable downstream predictions, particularly in scenarios where data is sparse or irregularly sampled, proving that temporal conditional estimation is a highly effective strategy for building robust foundation models.

TRACE: A Temporal Conditional Estimation for Multim... | AI Research

Key Takeaways

A Two-Stage Pipeline

Leveraging Cross-Modal Context

Improved Performance and Robustness

Comments (0)

No comments yet