Back to AI Research

AI Research

Generative Retrieval via Diffusion Transformer with... | AI Research

Key Takeaways

  • Generative Retrieval via Diffusion Transformer with Metric-Ordered Sequence Training and Hybrid-Policy Preference Optimization This paper addresses a common...
  • Embedding-based retrieval ranks items by their similarity to a query in a shared vector space and usually aims to return the highest-scoring items.
  • In many production settings this is not what is wanted: given a seed set that expresses a fine-grained pattern, one needs more items that both satisfy a target attribute and stay within that pattern.
  • We formalize this as pattern-preserving attribute retrieval.
  • The two goals pull against each other: averaging the seeds preserves the pattern but stays in a low-attribute region, while global attribute retrieval drifts to unrelated patterns.
Paper AbstractExpand

Embedding-based retrieval ranks items by their similarity to a query in a shared vector space and usually aims to return the highest-scoring items. In many production settings this is not what is wanted: given a seed set that expresses a fine-grained pattern, one needs more items that both satisfy a target attribute and stay within that pattern. We formalize this as pattern-preserving attribute retrieval. The two goals pull against each other: averaging the seeds preserves the pattern but stays in a low-attribute region, while global attribute retrieval drifts to unrelated patterns. We approach the task with continuous generative retrieval, where a model reads a sequence of item embeddings and generates query embeddings for nearest-neighbor search. We propose MO-DiT+HPPO, a staged framework with raw-sequence pretraining, multi-domain metric-ordered continuation pretraining, tail-centroid fine-tuning, and HPPO. Metric-ordered training turns sparse online retrieval labels into in-pattern trajectories ordered from low to high predicted attribute density, teaching one model the metric-improvement direction across domains. HPPO aligns the generated query distribution with the true online objective by labeling a hybrid candidate pool with the online intersection metric and applying reference-anchored preference optimization. A Pareto pair filter keeps only winner pairs that do not lower same-pattern purity, raising the attribute metric without sacrificing the pattern. Across four attribute domains under item- and pattern-holdout protocols, metric-ordered DiT improves the intersection metric over a pretrained generative retriever, and HPPO improves it further, with significant gains on seven of eight domain-split cells and a marginal tie on the hardest split. Metric-predictor validation, order ablations, CPT/SFT comparisons, and a candidate-policy ablation show where the gains come from.

Generative Retrieval via Diffusion Transformer with Metric-Ordered Sequence Training and Hybrid-Policy Preference Optimization
This paper addresses a common challenge in large-scale retrieval systems: finding items that satisfy a specific attribute (such as safety or quality) while remaining consistent with a fine-grained "pattern" or style defined by a seed set of items. Standard retrieval methods often struggle with this balance; they either drift toward unrelated items that happen to share the target attribute or stay too close to the seed items, failing to find higher-quality examples. The authors introduce a framework called MO-DiT+HPPO, which uses continuous generative retrieval to synthesize a query embedding that moves toward higher attribute density while preserving the original pattern.

The Challenge of Pattern-Preserving Retrieval

In many production environments, users provide a small "seed set" of items to define a specific intent or style. The goal is to retrieve more items that match this style while also meeting a target attribute. The authors note that this creates a tension: simply averaging the seed embeddings keeps the pattern but results in low-quality attribute matches, while optimizing only for the attribute leads to "pattern drift," where the system retrieves relevant items that look nothing like the seeds. The researchers formalize this as "pattern-preserving attribute retrieval" and define a primary metric, Joint@K, to measure success in both areas simultaneously.

A Staged Training Framework

The MO-DiT+HPPO framework uses a four-stage pipeline to train a diffusion transformer to generate effective query embeddings:

  1. Raw-Sequence Pretraining: The model is first trained on large-scale data to learn a general prior for continuous retrieval. 2. Metric-Ordered Continuation Pretraining: The researchers use a lightweight predictor to rank items within latent pattern clusters based on their predicted attribute density. By training the model on sequences that move from low-density to high-density items, the model learns the "direction" of improvement across different domains. 3. Tail-Centroid Fine-Tuning: The model is fine-tuned to map a sequence of seed items to the "centroid" (the average) of a high-performing tail of items, which helps the model focus on high-quality results without being overly sensitive to a single noisy example. 4. Hybrid-Policy Preference Optimization (HPPO): This final stage aligns the model with the true online objective. It uses a "hybrid-policy" candidate pool—combining deterministic constructions with the model's own generated samples—and applies a Pareto filter. This filter ensures that updates only proceed if they improve the attribute metric without degrading the pattern purity.

Results and Performance

The researchers evaluated their framework across four large-scale attribute domains using strict item- and pattern-holdout protocols. The results showed that the metric-ordered training significantly improved the primary intersection metric (Joint@K) compared to a strong baseline. The addition of HPPO provided further gains, with the Pareto filter proving critical: it allowed the model to push the "attribute–pattern frontier" outward, meaning the system could achieve higher attribute density without sacrificing the consistency of the retrieved patterns.

Key Takeaways

The success of this approach relies on the distinction between the training process and the evaluation process. While the researchers use a lightweight predictor to order training sequences, all final metrics are calculated using real, online top-K vector retrieval. By combining continuous generation—which allows the model to synthesize a query that doesn't necessarily exist as a single item—with a filter that prevents pattern drift, the framework effectively navigates the trade-off between attribute-seeking and pattern-preservation.

Comments (0)

No comments yet

Be the first to share your thoughts!