Boosting Brain-to-Image Decoding with TRIBE v2 Data Augmentation
Brain-to-image decoding—the process of reconstructing or identifying images based on a person’s brain activity—is currently limited by the high cost and time required to collect large amounts of neural data. Because modern decoders require thousands of stimulus-response pairs, they are often inaccessible to most research laboratories. This paper investigates whether this data bottleneck can be overcome by using synthetic data. By leveraging TRIBE v2, a large-scale model pretrained on over 1,000 hours of fMRI responses to sight, sound, and language, the researchers demonstrate that they can generate synthetic brain responses to new images to supplement limited real-world datasets.
How the Approach Works
The researchers use TRIBE v2 as a "synthetic fMRI generator." Even though TRIBE v2 was not originally trained on static images, the team adapted it by treating images as short, static videos. They then created an "operating grid" to test different mixtures of real and synthetic data. By keeping a small percentage of real fMRI data and adding a calculated amount of synthetic data generated by TRIBE v2, they trained decoders to map brain activity to image representations. This allows the model to learn from a much larger pool of stimuli than what was originally recorded in the lab.
Key Results
The study tested this method on two major fMRI datasets: the 7T Natural Scenes Dataset (NSD) and the 3T BOLD5000 dataset. The results show that adding synthetic data significantly improves performance in low-to-medium data regimes. Specifically, the researchers observed up to a 68% improvement in Top-10 image-retrieval accuracy compared to decoders trained only on real data. In some cases, this approach allowed researchers to reach 90% of the performance of a full-data model while using only a fraction of the actual scanning time—potentially saving hours of expensive fMRI data collection per subject.
Zero-Shot Potential and Limitations
A surprising finding is that in some settings, decoders trained exclusively on synthetic fMRI data performed above chance levels. This suggests that TRIBE v2 possesses a degree of "zero-shot" capability, meaning it can translate visual information into brain-like signals even without seeing specific real-world brain data for those images.
However, the authors emphasize that this augmentation is not a "plug-and-play" solution. The benefits are highly dependent on the specific dataset and the type of decoder used. Furthermore, the performance gains eventually saturate; adding too much synthetic data can stop helping or even hinder the model. Therefore, careful calibration of the ratio between real and synthetic data is essential for achieving the best results.
Comments (0)
to join the discussion
No comments yet
Be the first to share your thoughts!