Google Launches Gemini-SQL2: 80% Accuracy on BIRD Leaderboard

Google has announced the launch of Gemini-SQL2, a new text-to-SQL capability powered by the Gemini 3.1 Pro model. The system has achieved an 80.04% execution accuracy score on the BIRD (BIg Bench for LaRge-scale Database Grounded Text-to-SQL Evaluation) single-model leaderboard. This performance marks a significant advancement for Google, positioning the new capability ahead of its previous top-performing entry, Gemini-SQL.

Understanding Execution Accuracy

The BIRD benchmark serves as an industry standard for evaluating how effectively models translate natural language into SQL queries. Unlike older benchmarks, BIRD includes 12,751 question-SQL pairs across 95 databases and 37 professional domains, incorporating dirty data and requiring external knowledge grounding.
The 80.04% score reflects execution accuracy, meaning the generated SQL must not only appear valid but must also run successfully and return results that match the gold query. Google emphasizes that this metric ensures the system produces execution-ready queries rather than simply plausible-looking code. While this represents a high level of performance, it remains below the human benchmark of 92.96%, leaving a 12.92-point gap.

Integration and Potential Applications

Gemini-SQL2 is designed to translate natural language questions into executable SQL queries. While Google has not yet confirmed which specific products will receive this update, the company noted that improved SQL understanding could enhance natural language skills across its existing data services. Potential integration targets include BigQuery Studio, AlloyDB AI, and Cloud SQL Studio, which already utilize Gemini-based SQL generation.
The capability is intended to assist in various data-related tasks, such as self-service analytics and data engineering. For example, users could perform complex operations like joins, window logic, and date arithmetic by asking questions in natural language. However, Google suggests that human review remains necessary for production environments, as an 80% accuracy rate implies that one in five queries may still be incorrect.

Competitive Landscape

On the BIRD single-model leaderboard, Google now holds the top two positions with Gemini-SQL2 and Gemini-SQL. The single-model track is notable for restricting the use of complex preprocessing, retrieval, and agentic frameworks, thereby isolating the core text-to-SQL ability of the models.
The current leaderboard features a variety of systems, including specialized 32B SQL models that outperform several general frontier models. Despite the announcement, Google has not yet published a specific model string, API, or technical report for Gemini-SQL2. Developers looking to implement schema-grounded SQL generation currently rely on existing Gemini models via the google-genai SDK, with the expectation that they will be able to swap in the new capability once it is officially released.

Google Launches Gemini-SQL2: 80% Accuracy on BIRD Leaderboard

Key Takeaways

Understanding Execution Accuracy

Integration and Potential Applications

Competitive Landscape

Comments (0)

No comments yet

Google Launches Gemini-SQL2: 80% Accuracy on BIRD Leaderboard

Key Takeaways

Understanding Execution Accuracy

Integration and Potential Applications

Competitive Landscape

Get a Free AI Prompt Guide

Comments (0)

No comments yet