Natural Language to SQL with LLMs

Introduction

Natural language to SQL (NL2SQL) is one of the most impactful applications of LLMs in data analytics. It enables non-technical users to query databases using plain English, democratizing data access across organizations. While early NL2SQL systems were fragile, modern LLMs can generate accurate SQL for complex analytical queries — when properly configured with schema context, safety guards, and validation.

How NL2SQL Works

The core architecture is straightforward:

User Question → Context Assembly → LLM SQL Generation → Query Validation → Execution → Result Formatting

The critical step is context assembly — providing the LLM with enough database metadata to generate correct SQL.

Schema Context

The LLM needs to understand your database schema. A well-structured schema prompt includes:

CREATE TABLE customers (

customer_id INTEGER PRIMARY KEY,

first_name VARCHAR(100),

last_name VARCHAR(100),

email VARCHAR(255),

signup_date DATE,

country VARCHAR(50),

lifetime_value DECIMAL(10,2)

);

CREATE TABLE orders (

order_id INTEGER PRIMARY KEY,

customer_id INTEGER REFERENCES customers(customer_id),

order_date TIMESTAMP,

total_amount DECIMAL(10,2),

status VARCHAR(20) -- pending, shipped, delivered, cancelled

);

CREATE TABLE order_items (

item_id INTEGER PRIMARY KEY,

order_id INTEGER REFERENCES orders(order_id),

product_name VARCHAR(200),

quantity INTEGER,

unit_price DECIMAL(10,2)

);

Key additions for better accuracy:

Column descriptions: -- customer's two-letter ISO country code
Value examples: -- status options: pending, shipped, delivered, cancelled
Relationship hints: -- join via customer_id to get customer details
Index hints: -- indexed on order_date for range queries

def build_schema_context(tables):

context_parts = []

for table in tables:

ddl = f"CREATE TABLE {table.name} (\n"

for col in table.columns:

ddl += f" {col.name} {col.type} -- {col.description}\n"

ddl += ");"

context_parts.append(ddl)

Add sample rows for context on data patterns

if table.sample_rows:

context_parts.append(f"-- Sample row: {table.sample_rows[0]}")

return "\n\n".join(context_parts)

Few-Shot Examples

For complex schemas, provide question-SQL pairs relevant to the query:

few_shot_examples = [

{

"question": "Show me the top 5 customers by total spending",

"query": """

SELECT

c.first_name || ' ' || c.last_name AS customer_name,

SUM(o.total_amount) AS total_spent

FROM customers c

JOIN orders o ON c.customer_id = o.customer_id

WHERE o.status != 'cancelled'

GROUP BY c.customer_id, c.first_name, c.last_name

ORDER BY total_spent DESC

LIMIT 5

"""

{

"question": "What was the monthly revenue for 2025?",

"query": """

SELECT

DATE_TRUNC('month', o.order_date) AS month,

SUM(o.total_amount) AS revenue

FROM orders o

WHERE o.status = 'delivered'

AND o.order_date >= '2025-01-01'

AND o.order_date < '2026-01-01'

GROUP BY month

ORDER BY month

"""

}

]

Dynamic example retrieval using embedding similarity improves accuracy significantly. Store question-query pairs in a vector database and retrieve the most relevant ones for each new question.

SQL Generation Prompt

The complete prompt combines schema, examples, and safety instructions:

def build_nl2sql_prompt(question, schema, examples, dialect="postgresql"):

prompt = f"""You are a {dialect} SQL expert. Convert natural language questions into SQL queries.

Database Schema:

{schema}

Rules:

1\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\. Return ONLY the SQL query, no explanations

2\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\. Use proper JOINs based on foreign key relationships

3\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\. Handle NULLs appropriately with COALESCE or IS NULL checks

4\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\. Use appropriate aggregate functions (COUNT, SUM, AVG) when needed

5\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\. Always include filters to exclude irrelevant data

6\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\. Use LIMIT for result size control

7\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\. Order results when ordering is implied

8\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\. Never use DELETE, UPDATE, INSERT, DROP, or ALTER statements

Examples:

"""

for ex in examples:

prompt += f"\nQ: {ex['question']}\nSQL: {ex['query']}\n"

prompt += f"\nQ: {question}\nSQL:"

return prompt

Query Validation

Before executing generated SQL, validate it:

def validate_sql(query, allowed_tables):

errors = []

Safety check: no mutations

forbidden_keywords = ["DELETE", "UPDATE", "INSERT", "DROP", "ALTER", "TRUNCATE", "CREATE"]

for keyword in forbidden_keywords:

if keyword in query.upper():

errors.append(f"Forbidden operation: {keyword}")

Check table references

parsed = sqlparse.parse(query)[0]

used_tables = extract_tables(parsed)

for table in used_tables:

if table not in allowed_tables:

errors.append(f"Unauthorized table: {table}")

Syntax validation

try:

conn = get_dummy_connection(dialect)

conn.execute(f"EXPLAIN {query}")

except Exception as e:

errors.append(f"SQL syntax error: {str(e)}")

return errors

Advanced Techniques

Self-Correction Loop

When the generated SQL fails or returns empty results, the system can self-correct:

def nl2sql_with_correction(question, schema, max_attempts=3):

for attempt in range(max_attempts):

query = generate_sql(question, schema)

errors = validate_sql(query)

if not errors:

try:

results = execute_sql(query)

if results:

return format_results(question, query, results)

else:

feedback = "The query returned no results. Try a broader search."

except Exception as e:

feedback = f"Execution error: {str(e)}"

else:

feedback = f"Validation errors: {', '.join(errors)}"

question = f"Original question: {question}\nPrevious attempt failed: {feedback}\nPlease fix the SQL query."

Schema Linking

For large schemas with many tables, first identify relevant tables before generating SQL:

def identify_relevant_tables(question, all_tables, table_descriptions):

prompt = f"""

Question: {question}

Available tables with descriptions:

{table_descriptions}

List only the tables needed to answer this question, comma-separated:

"""

response = call_llm(prompt)

return [t.strip() for t in response.split(",")]

This dramatically reduces context size and improves accuracy on large databases.

Production Deployment

Security : Always use a read-only database user, set statement timeouts, and implement rate limiting per user.

Caching : Cache common NL2SQL conversions to reduce LLM costs, but invalidate when schema changes.

Monitoring : Log all generated queries and user feedback. Track query success rate, execution time, and correction frequency.

User experience : Show the generated SQL to users (with a "view query" option), and allow them to provide feedback on results.

Conclusion

Natural language to SQL with LLMs is production-ready for analytical queries. The key success factors are high-quality schema context, relevant few-shot examples, robust query validation, and a self-correction loop. Start with a limited set of tables, measure query accuracy, and expand as you gain confidence. This technology is transforming data access, making self-serve analytics a reality for non-technical team members.