Researchers at MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL) and their collaborators have developed a new method called CompreSSM that allows artificial intelligence models to shed unnecessary complexity while they are still learning. By identifying and removing "dead weight" early in the training process, the technique enables models to become leaner and faster without sacrificing performance, effectively bypassing the traditional trade-off between training a massive model and settling for a smaller, less capable one.
A Control-Theory Approach to Compression
CompreSSM targets state-space models, a family of AI architectures used in applications ranging from robotics to audio generation and language processing. The researchers utilize mathematical tools from control theory—specifically Hankel singular values—to measure how much each internal state contributes to the model's overall behavior. Because the relative importance of these components stabilizes early, the team can reliably rank which dimensions are essential after only about 10 percent of the training process. Once these rankings are established, the less-important components are surgically removed, allowing the remaining 90 percent of training to proceed at the speed of a much smaller model.
Efficiency Gains and Performance
The results of this approach are significant. On image classification benchmarks, compressed models maintained nearly the same accuracy as full-sized versions while training up to 1.5 times faster. In tests using Mamba, a widely used state-space architecture, the method achieved approximately 4x training speedups by compressing a 128-dimensional model down to 12 dimensions while maintaining competitive performance. Unlike conventional pruning or knowledge distillation, which often require training a large model to completion or running multiple models simultaneously, CompreSSM makes informed compression decisions mid-stream, avoiding the additional computational costs associated with traditional methods.
A New Standard for Model Training
This technique offers a pragmatic safety net for developers, as practitioners can revert to a previously saved checkpoint if a compression step leads to an unexpected drop in performance. While the method is most effective on multi-input, multi-output models where the relationship between state size and performance is strongest, the researchers see this as a foundational step. Future work aims to extend the technique to matrix-valued dynamical systems used in linear attention mechanisms, potentially bringing the benefits of CompreSSM to the transformer architectures that underpin many of today’s largest AI systems. The research, which was supported by organizations including the Max Planck ETH Center for Learning Systems, Boeing, and the U.S. Office of Naval Research, will be presented at the International Conference on Learning Representations 2026.

Comments (0)
to join the discussion
No comments yet
Be the first to share your thoughts!