Researchers introduce WRING, a new post-processing method that removes bias from vision language models without creating unintended consequences.
Researchers from MIT, Worcester Polytechnic Institute, and Google have introduced a new debiasing technique called “Weighted Rotational DebiasING,” or WRING, designed to eliminate bias in vision language models (VLMs) without the unintended consequences associated with current methods. The approach, which will be presented at the 2026 International Conference for Learning Representations, offers a more precise way to address the “Whac-a-Mole dilemma,” a phenomenon where removing one bias inadvertently creates or amplifies others.
Bias in artificial intelligence is a significant safety concern, particularly in high-stakes fields like medicine, where models used for tasks such as classifying skin lesions could fail to identify high-risk patients if they are biased toward certain skin tones. While researchers have previously used a post-processing method known as “projection debiasing” to address these issues, the technique has notable drawbacks.
Projection debiasing works by removing biased information from a model’s embedding space by “projecting” the subspace out. According to Walter Gerych, the paper’s first author and an assistant professor of computer science at Worcester Polytechnic Institute, this process inadvertently alters other learned relationships within the model. This leads to the “Whac-a-Mole dilemma,” where fixing one bias—such as racial bias in an image retrieval model—can unintentionally amplify another, such as gender bias.
WRING addresses these limitations by rotating, rather than removing, coordinates within the model’s high-dimensional space. By shifting the coordinates responsible for bias to a different angle, the model becomes unable to distinguish between groups within a specific concept while leaving the model’s other internal relationships intact.
Because WRING is a post-processing approach, it can be applied to pre-trained VLMs like OpenAI’s OpenCLIP without requiring the model to be retrained from scratch. Gerych notes that this makes the technique highly efficient and minimally invasive, as it avoids the significant resources and costs associated with training large models from the beginning.
In their research, the team found that WRING significantly reduced bias for target concepts without increasing bias in other areas. Currently, the application of WRING is limited to Contrastive Language-Image Pre-training (CLIP) models, which connect images to language for classification and search tasks.
The research team, which includes MIT graduate students Cassandra Parent and Quinn Perian, Google’s Rafiya Javed, and MIT associate professors Justin Solomon and Marzyeh Ghassemi, views the expansion of this technique as the next logical step. Gerych notes that extending WRING to ChatGPT-style, generative language models is the team's intended path forward. The project was supported by several organizations, including the National Science Foundation, the Gordon and Betty Moore Foundation, and the MIT-Google Computing Innovation Award.