Improved techniques for fine-tuning flow models via...

Improved techniques for fine-tuning flow models via adjoint matching: a deterministic control pipeline
This research introduces a new method for aligning generative flow models—such as those used for high-quality image generation—with human preferences. While base models are excellent at creating diverse content, they often struggle to follow specific prompts or meet human quality standards. The authors propose a "deterministic adjoint matching" framework that treats the fine-tuning process as an optimal control problem. By adjusting the velocity fields that guide the model's generation process, the researchers can steer the model toward more desirable outputs while maintaining the efficiency and stability of the original architecture.

A New Control Pipeline

The core of this approach is a deterministic control pipeline. Instead of relying on traditional reinforcement learning, which can be unstable and computationally demanding, the authors formulate the fine-tuning process as a way to learn a "velocity perturbation." This perturbation is added to the pretrained model, effectively guiding it toward a target that reflects human preferences. By using a deterministic framework, the model remains compatible with the efficient, high-speed sampling methods already used in state-of-the-art flow-based models like FLUX.2.

Truncated Adjoint Matching

A significant challenge in fine-tuning large generative models is the high computational cost of calculating the "adjoint"—a mathematical tool used to determine how changes in the model affect the final output. The authors observe that the most critical information for reward alignment is concentrated in the final stages of the generation process. To capitalize on this, they introduce a "truncated adjoint scheme." By focusing the computational effort only on these terminal steps, the method achieves substantial speedups—reducing the time required per update from 345 seconds to 32 seconds on large models—without sacrificing the quality of the alignment.

Beyond Standard Regularization

To prevent the model from "reward hacking" (where it produces high-scoring but low-quality or repetitive images), researchers typically use KL-based regularization to keep the fine-tuned model close to the original. This paper moves beyond the standard quadratic cost, proposing more flexible, higher-order polynomial regularizers. These allow for a more nuanced trade-off between strictly following the original model’s distribution and pushing for higher reward scores. This flexibility helps the model maintain diversity and prevents the "mode collapse" often seen in other fine-tuning approaches.

Empirical Performance

The researchers tested their framework on two powerful backbones, SiT-XL/2 and FLUX.2-Klein-4B. The results show consistent improvements across several key metrics, including aesthetic scores, image-reward ratings, and prompt adherence. Beyond just improving scores, the method successfully preserved the diversity of the generated images and reduced mode collapse, demonstrating that the model can be effectively aligned with human preferences while retaining the foundational strengths of the original, pretrained generative system.

Improved techniques for fine-tuning flow models via... | AI Research

Key Takeaways

A New Control Pipeline

Truncated Adjoint Matching

Beyond Standard Regularization

Empirical Performance

Comments (0)

No comments yet