Semantic Layers for Reliable LLM-Powered Data Analytics: A Paired Benchmark of Accuracy and Hallucination Across Three Frontier Models
When using Large Language Models (LLMs) to query analytical databases, users often encounter incorrect answers or confident hallucinations. This research investigates whether these failures occur because models are forced to guess business logic that isn't explicitly defined in the database schema. By providing LLMs with a "semantic layer"—a document containing specific business rules and definitions—the authors test if this context can bridge the gap between natural language questions and accurate database queries.
The Testing Protocol
The researchers benchmarked three frontier LLMs (Claude Opus 4.7, Claude Sonnet 4.6, and GPT-5.4) using 100 natural-language questions based on the Cleaned Contoso Retail Dataset in ClickHouse. To isolate the impact of business context, they used a paired single-shot protocol. Each model was tested twice: first, with access only to the raw database schema, and second, with the addition of a 4 KB markdown document that detailed the dataset’s measures, conventions, and disambiguation rules.
Significant Gains in Accuracy
The results demonstrate a clear performance boost when the semantic document is provided. Across all three models, accuracy improved by 17 to 23 percentage points. Interestingly, the study found that the specific choice of model mattered far less than the presence of the semantic layer. When the document was provided, all three models performed at a similar level (67.7–68.7% accuracy). Without the document, they also performed similarly to one another, but at a significantly lower baseline (45.5–50.5% accuracy).
A Structural Shift in Task Requirements
The authors conclude that the improvement is not necessarily due to the models becoming "smarter" or more capable. Instead, the semantic layer changes the nature of the task itself. By providing explicit business definitions, the model is no longer required to infer or guess the underlying logic of the data. This structural change effectively suppresses the most common types of text-to-SQL errors, suggesting that providing clear, human-authored context is a more reliable way to improve analytical accuracy than relying on the model's internal reasoning alone.
Comments (0)
to join the discussion
No comments yet
Be the first to share your thoughts!