Smart Queries: Balancing Power and Cost in Natural Language to SQL Conversion

Author: Denis Avetisyan

A new system intelligently combines smaller, efficient AI models with larger language models to translate natural language questions into database queries.

This research introduces a cost-effective, schema-aware NL2SQL approach leveraging a hybrid architecture of Small Language Models and Large Language Models.

While natural language to SQL (NL2SQL) systems promise broader data accessibility, reliance on large language models (LLMs) introduces substantial computational and privacy concerns. This paper presents ‘An Agentic System for Schema Aware NL2SQL Generation’, a novel framework that strategically combines small language models (SLMs) as primary query generators with a selective LLM fallback mechanism. This hybrid architecture achieves a significant reduction in operational costs – over 90% compared to LLM-only baselines – while maintaining competitive execution accuracy on the BIRD benchmark. Could this agentic approach unlock practical, cost-effective NL2SQL solutions for resource-constrained environments and democratize access to valuable data insights?

The Challenge of Translating Intent to Query

The translation of natural language into Structured Query Language (SQL) – commonly known as the NL2SQL task – represents a considerable challenge at the intersection of artificial intelligence and data management. This difficulty stems from the inherent ambiguity of human language and the rigid precision required by database queries. Unlike conversational interactions, where context and inference often resolve unclear statements, SQL demands explicitly defined criteria for data retrieval. A seemingly simple question – “How many employees are in the sales department?” – requires the system to accurately identify the relevant table, column names, and filtering conditions within a potentially complex database schema. Successfully bridging this semantic gap necessitates models capable of not only understanding the intent behind a question but also accurately mapping that intent to the specific structure and constraints of the underlying database, a task that continues to push the boundaries of current AI capabilities.

Early attempts at converting natural language into SQL queries often faltered when faced with the intricacies of real-world databases and the ambiguity inherent in human language. These traditional methods, frequently relying on keyword matching and simplistic grammatical rules, proved inadequate for handling complex queries involving multiple tables, nested conditions, or nuanced database schemas. The result was a high rate of inaccurate translations, where the generated SQL failed to retrieve the intended information or, worse, returned entirely incorrect results. This limitation stemmed from an inability to fully capture the semantic meaning of the natural language question and map it correctly onto the relational structure of the database – a challenge that demanded more sophisticated approaches capable of understanding context, relationships, and the underlying intent of the query.

Modular Systems: Deconstructing the Query

Agentic systems address the Natural Language to SQL (NL2SQL) task by modularizing the query generation process into discrete, specialized agents. This contrasts with traditional end-to-end models by decomposing the complex task into a series of simpler, manageable steps, each handled by a dedicated agent. These agents operate sequentially, passing information between each other to refine and ultimately construct the SQL query. This modular approach allows for greater interpretability, easier debugging, and improved performance through focused specialization of each agent’s function within the overall system.

Agentic systems utilize a Schema Extraction Agent to parse and interpret the structure of the underlying database. This agent identifies tables, columns, data types, and relationships, creating a metadata representation used by subsequent agents. Simultaneously, a Query Decomposition Agent addresses complex natural language queries by breaking them down into a series of simpler, more manageable sub-queries. This decomposition process facilitates accurate translation into SQL by isolating individual informational requests and reducing ambiguity, particularly in scenarios involving multiple conditions or aggregations.

The SQL Generation Agent is responsible for translating the decomposed query plan into a valid SQL query string, utilizing the schema information provided by the Schema Extraction Agent. Following query construction, the Query Validation & Execution Agent performs a series of checks to ensure syntactic and semantic correctness. This includes parsing the generated SQL, validating table and column references against the database schema, and, crucially, executing the query against the database to confirm functionality and prevent runtime errors. The agent then returns the query results, or an error message if validation or execution fails, providing a closed-loop system for query refinement.

Refinement Through Iteration: Ensuring Accurate Results

The Query Validation & Execution Agent utilizes an Iterative Refinement process to enhance query accuracy by systematically identifying and correcting errors and ambiguities. This involves an initial query execution, followed by analysis of the results to detect potential issues such as semantic misunderstandings or incomplete information. The agent then reformulates the query based on this analysis, repeating the execution and validation cycle until a satisfactory result is achieved or a predetermined iteration limit is reached. This iterative approach allows the system to progressively improve its understanding of the user’s intent and generate more accurate responses, even in the presence of complex or poorly defined queries.

System performance is quantitatively assessed using the BIRD Benchmark, a standardized evaluation suite for query validation and execution. Current results demonstrate an Execution Accuracy (EX) of 47.78%, indicating the percentage of queries successfully executed and returning the correct result. Additionally, the system achieves a Validation Efficiency Score (VES) of 51.05%, representing a measure of the computational resources utilized during the validation process relative to the complexity of the queries. These metrics provide a baseline for tracking improvements in both query correctness and resource optimization.

The system architecture is designed to leverage Small Language Models (SLMs) for query resolution, achieving a 67% success rate in resolving queries locally without requiring more computationally expensive models. This localized processing significantly reduces operational costs by minimizing reliance on external APIs or larger model inferences. The use of SLMs for a majority of queries represents a key optimization, directly impacting the overall cost-effectiveness of the query processing pipeline and enabling scalability for high-volume applications.

The Evolving Landscape: Beyond Basic Functionality

Recent advancements in agentic systems for SQL query generation aren’t limited to a single approach; instead, a wave of innovative systems – including MAC-SQL, DAIL-SQL, and DIN-SQL – build upon the foundational ‘Agentic System’ with specialized features and optimization techniques. These systems move beyond simple query generation by incorporating mechanisms like multi-step reasoning, dynamic index selection, and database-specific optimizations. MAC-SQL, for example, leverages a memory-augmented approach, while DAIL-SQL focuses on iterative refinement with a dedicated ‘evaluator’ agent. DIN-SQL introduces dynamic indexing to drastically reduce search spaces. This diversification demonstrates a clear trend toward tailored solutions, each addressing specific challenges in translating natural language into efficient and accurate database queries, ultimately enhancing performance and reducing computational cost.

The efficacy of SQL generation agents is significantly enhanced through the implementation of schema-aware prompting, a technique designed to minimize instances of “hallucination” – the generation of invalid or nonsensical SQL code. By explicitly incorporating database schema information directly into the prompts guiding the agent, the system gains a more robust understanding of table structures, column types, and relationships. This focused approach not only reduces errors but also demonstrably improves the overall quality of generated queries, ensuring they are syntactically correct and logically aligned with the intended database operations. The result is a more reliable and accurate system capable of consistently translating natural language questions into functional SQL commands.

Significant economic benefits accompany the implementation of advanced agentic systems in SQL query generation. Recent studies demonstrate that architectures like MAC-SQL, DAIL-SQL, and DIN-SQL achieve over 90% cost reduction when contrasted with traditional large language model (LLM)-only approaches. This substantial decrease in expenditure is driven by optimized prompting strategies and reduced reliance on computationally expensive LLM inferences. The average cost per query, after incorporating these advancements, has been measured at just 0.094, indicating a pathway towards significantly more affordable and scalable data interaction. These findings highlight the potential for businesses and researchers to unlock valuable insights from their databases while minimizing operational expenses.

The pursuit of an agentic system for schema-aware NL2SQL generation exemplifies a commitment to distillation, a principle echoing John von Neumann’s observation: “If people do not believe that mathematics is simple, it is only because they do not realize how elegantly nature operates.” This work prioritizes cost efficiency through the strategic deployment of Small Language Models, accepting a measured reliance on Large Language Models only when necessary. The architecture isn’t about maximizing complexity; rather, it’s about achieving functionality with the minimal necessary components, mirroring a natural elegance. The focus on a hybrid approach, skillfully balancing SLMs and LLMs, underscores the value of parsimony in design-what remains is precisely what matters.

Further Refinements

The demonstrated efficiency gains, achieved through judicious delegation to smaller models, merely highlight the ongoing extravagance inherent in current large language model deployment. The core issue isn’t simply cost-though that is a practical impediment-but the unnecessary expenditure of computational resources on tasks readily handled by more focused architectures. Future work must address the limitations of schema awareness; current systems still exhibit fragility when confronted with complex or ambiguous database structures. The present study offers a functional, if preliminary, solution; a truly robust system will necessitate a more formalized representation of relational data, moving beyond the reliance on textual prompting.

A critical, and often overlooked, constraint lies in the evaluation metrics. Current benchmarks prioritize exact match accuracy, failing to adequately capture semantic equivalence. A query that achieves the intended result, even with syntactic variations, should be recognized as valid. The pursuit of perfect form obscures functional utility. The field would benefit from metrics that prioritize correctness over conformity.

Ultimately, this work exemplifies a necessary trend: a move away from monolithic models toward hybrid systems that leverage the strengths of diverse architectures. Emotion is a side effect of structure, and the current obsession with scale obscures a simpler truth: clarity is compassion for cognition. The path forward lies not in building ever-larger models, but in building smarter systems.

Original article: https://arxiv.org/pdf/2603.18018.pdf

Contact the author: https://www.linkedin.com/in/avetisyan/

The Challenge of Translating Intent to Query

Modular Systems: Deconstructing the Query

Refinement Through Iteration: Ensuring Accurate Results

The Evolving Landscape: Beyond Basic Functionality

Further Refinements

See also: