Schema Design Patterns: Normalization, Denormalization, Naming Conventions

Schema Design Patterns: Normalization, Denormalization, Naming Conventions

Schema design is the foundation of application performance and maintainability. Good schemas are intuitive, performant, and resilient to change. This article covers normalization trade-offs, denormalization strategies, and practical naming conventions.

Normalization

Normalization eliminates data redundancy through a series of normal forms. Most production schemas aim for Third Normal Form (3NF).

First Normal Form (1NF)

Each column contains atomic values. No repeating groups or arrays.

\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\-- Violates 1NF: multi-valued column

CREATE TABLE orders (

id INTEGER PRIMARY KEY,

product_ids TEXT -- '1,2,3' as comma-separated values

);

\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\-- 1NF compliant: separate rows per product

CREATE TABLE order_items (

order_id INTEGER REFERENCES orders(id),

product_id INTEGER REFERENCES products(id),

quantity INTEGER NOT NULL,

PRIMARY KEY (order_id, product_id)

);

Second Normal Form (2NF)

1NF + every non-key column depends on the whole primary key (relevant for composite keys):

\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\-- Violates 2NF: product_name depends only on product_id, not on order_id

CREATE TABLE order_items (

order_id INTEGER,

product_id INTEGER,

product_name TEXT, -- depends on product_id only

quantity INTEGER,

PRIMARY KEY (order_id, product_id)

);

\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\-- 2NF compliant: product_name moved to products table

CREATE TABLE products (id INTEGER PRIMARY KEY, name TEXT);

Third Normal Form (3NF)

2NF + no transitive dependencies (non-key columns depend only on the primary key):

\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\-- Violates 3NF: category_name depends on category_id, not on order_id

CREATE TABLE products (

id INTEGER PRIMARY KEY,

name TEXT,

category_id INTEGER,

category_name TEXT -- transitively depends on category_id

);

\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\-- 3NF compliant

CREATE TABLE categories (id INTEGER PRIMARY KEY, name TEXT);

CREATE TABLE products (id INTEGER PRIMARY KEY, name TEXT, category_id INTEGER);

Denormalization

Denormalization intentionally adds redundancy for performance. Use it judiciously.

Read-Optimized Denormalization

\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\-- Normalized form

SELECT COUNT(*) FROM orders WHERE user_id = 42;

\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\-- Denormalized: add order_count to users table

ALTER TABLE users ADD COLUMN order_count INTEGER DEFAULT 0;

\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\-- Keep it in sync (via trigger or application logic)

UPDATE users SET order_count = order_count + 1 WHERE id = NEW.user_id;

Pre-Joined Tables

For read-heavy reporting, pre-join frequently accessed data:

CREATE TABLE order_summaries (

order_id INTEGER PRIMARY KEY,

user_email TEXT,

total NUMERIC,

item_count INTEGER,

first_item_name TEXT,

created_at TIMESTAMPTZ

);

\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\-- Refresh periodically

INSERT INTO order_summaries

SELECT o.id, u.email, o.total, COUNT(oi.id),

MIN(p.name), o.created_at

FROM orders o

JOIN users u ON u.id = o.user_id

JOIN order_items oi ON oi.order_id = o.id

JOIN products p ON p.id = oi.product_id

GROUP BY o.id, u.email, o.total, o.created_at

ON CONFLICT (order_id) DO NOTHING;

Naming Conventions

Consistent naming conventions reduce cognitive load and make schemas self-documenting.

Tables

  • Plural nouns : users, orders, products, order_items

  • Join tables : Combine table names: user_roles, product_categories

  • Avoid reserved words : Never name a table user, order, or group without quoting

Columns

  • Primary keys : id (singular, generic)

  • Foreign keys : {table}_id (e.g., user_id, product_id)

  • Timestamps : created_at, updated_at, deleted_at

  • Boolean flags : is_active, has_billing, email_verified

Indexes

\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\-- Naming convention: idx_{table}_{column(s)}

CREATE INDEX idx_users_email ON users (email);

CREATE INDEX idx_orders_user_date ON orders (user_id, created_at);

\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\-- Unique indexes: uq_{table}_{column(s)}

CREATE UNIQUE INDEX uq_users_email ON users (email);

Timestamp Handling

Timestamps are a common source of bugs in schema design.

Always Use TIMESTAMPTZ

CREATE TABLE events (

id BIGSERIAL PRIMARY KEY,

occurred_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),

created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),

updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW()

);

TIMESTAMPTZ stores values in UTC internally and converts to the session timezone on display. Using TIMESTAMP without timezone invites timezone-related bugs.

Automatic Updated At

CREATE OR REPLACE FUNCTION trigger_set_updated_at()

RETURNS TRIGGER AS $$

BEGIN

NEW.updated_at = NOW();

RETURN NEW;

END;

$$ LANGUAGE plpgsql;

CREATE TRIGGER set_updated_at

BEFORE UPDATE ON users

FOR EACH ROW

EXECUTE FUNCTION trigger_set_updated_at();

Soft Deletes

ALTER TABLE users ADD COLUMN deleted_at TIMESTAMPTZ;

\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\-- Queries must always filter:

SELECT * FROM users WHERE deleted_at IS NULL;

\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\-- Create a partial index for active users

CREATE INDEX idx_users_active ON users (email) WHERE deleted_at IS NULL;

Schema Anti-Patterns

Entity-Value-Attribute (EAV)

\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\-- Anti-pattern: EAV makes queries painful

CREATE TABLE product_attributes (

product_id INTEGER,

attribute_name TEXT,

attribute_value TEXT,

PRIMARY KEY (product_id, attribute_name)

);

EAV requires complex pivoting for every query. Use JSONB or add columns instead.

Polymorphic Associations

\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\-- Anti-pattern: polymorphic foreign key

CREATE TABLE comments (

id BIGSERIAL PRIMARY KEY,

commentable_type TEXT, -- 'Post' or 'Video'

commentable_id INTEGER -- No foreign key constraint!

);

Use separate join tables or inheritance patterns instead.

Oversized VARCHAR

\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\-- Bad: arbitrary limit

CREATE TABLE users (name VARCHAR(10));

\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\-- Better: use TEXT with application-level validation

CREATE TABLE users (name TEXT);

PostgreSQL treats VARCHAR(n) and TEXT identically internally, but arbitrary limits cause unnecessary application bugs.

Practical Schema Checklist

  • [ ] Primary key on every table (BIGSERIAL or UUID)

  • [ ] Foreign keys with indexes on referencing columns

  • [ ] created_at and updated_at timestamps

  • [ ] NOT NULL on columns that should never be null

  • [ ] Check constraints for domain validation

  • [ ] Unique constraints for business-unique columns

  • [ ] Consistent naming (plurals, _id suffix for FKs)

  • [ ] TIMESTAMPTZ everywhere (never TIMESTAMP)

  • [ ] Indexes match query patterns (verified with EXPLAIN)

  • [ ] Appropriate normalization level (3NF typically, denormalize knowingly)

Schema design is a long-term investment. A well-designed schema reduces bugs, makes queries faster, and makes the codebase easier for new developers to understand. Invest the time upfront; the payoff compounds over years of maintenance.