Curated AI beats frontier LLMs at pharma asset disc...

Curated AI beats frontier LLMs at pharma asset discovery
This paper investigates how to improve the accuracy and comprehensiveness of AI tools used to scout pharmaceutical pipelines. While general-purpose Large Language Models (LLMs) are increasingly used to track drug development, they often struggle to identify "long-tail" assets—such as early-stage preclinical programs or niche developments from smaller or international firms. The authors introduce "Gosset," an AI platform that replaces generic web search with a curated index of drug-asset annotations, and compare its performance against four leading frontier LLMs.

The Challenge of the "Long Tail"

When pharma analysts search for drugs targeting specific proteins or conditions, they require high recall (finding every real program) and high precision (avoiding fabricated names). Frontier LLMs perform well for high-profile, late-stage drugs that appear frequently in press releases. However, they often fail to capture the vast majority of the pipeline, which consists of preclinical, academic, and smaller biotech programs. Because these models rely on general web search, they struggle to find information that is sparsely indexed or buried in niche sources, and they are prone to hallucinating when asked to generate exhaustive lists.

How Gosset Works

Gosset functions as a chat interface that, instead of searching the open web, queries a structured, curated index of target, modality, and indication-level drug data. To test its effectiveness, the researchers conducted a head-to-head comparison using ten niche oncology and immunology targets. All systems—Gosset and four frontier LLMs—received the same natural-language queries and were required to output results in the same structured format. The researchers validated the findings through a rigorous three-layer pipeline: deterministic auto-passing for known data, an "LLM-as-a-judge" cross-check, and final sign-off by human experts with pharmaceutical backgrounds.

Key Results

The study found that Gosset significantly outperformed frontier models in identifying drug assets. Across the ten targets, Gosset returned 3.2 times more verified drugs than the best-performing frontier system. While the frontier models were generally accurate (maintaining high precision), they suffered from a major recall gap, missing the majority of the preclinical and early-stage assets that Gosset successfully surfaced. Additionally, because Gosset queries a structured database rather than performing multiple live web searches, it provides answers in a fraction of the time, offering a much faster, more interactive experience for users.

Limitations and Future Directions

The authors note that their "100% recall" metric is limited to the "discoverable universe" of drugs—those traceable to public sources like patents, conferences, and press releases. Programs that remain purely internal or undisclosed are invisible to all systems tested. Furthermore, the study acknowledges that the results may be biased toward targets where Gosset’s index is particularly well-populated. To address the recall gap in other models, the authors suggest that frontier LLMs can be connected to the Gosset index via the Model Context Protocol (MCP). This would allow these models to retain their natural language reasoning capabilities while offloading the task of asset enumeration to a specialized, curated database.

Curated AI beats frontier LLMs at pharma asset disc... | AI Research

Key Takeaways

The Challenge of the "Long Tail"

How Gosset Works

Key Results

Limitations and Future Directions

Comments (0)

No comments yet