Full-Text Search in PostgreSQL: tsvector, tsquery, GIN Indexes

PostgreSQL's full-text search (FTS) provides built-in text search capabilities without external dependencies. While not as feature-rich as Elasticsearch or Meilisearch, it handles a large class of search needs efficiently.
The FTS Pipeline
Full-text search in PostgreSQL follows this flow:
- Parsing : Break text into tokens (lexemes).
2\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\. Normalization : Convert tokens to a standard form via a dictionary (stemming, stop words). 3\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\. Indexing : Store the normalized tokens in a tsvector. 4\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\. Querying : Match a tsquery against the tsvector using a GIN index.
tsvector and tsquery
tsvector
A tsvector is a sorted list of distinct lexemes with positional information:
SELECT to_tsvector('english',
'The quick brown fox jumps over the lazy dog');
\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\-- 'brown':3 'dog':9 'fox':4 'jump':5 'lazy':8 'quick':2
Notice "jumps" became "jump" (stemming), and "the", "over" are removed (stop words).
tsquery
A tsquery represents a search query:
SELECT to_tsquery('english', 'fox & dog');
\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\-- 'fox' & 'dog'
SELECT to_tsquery('english', 'jump | run');
\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\-- 'jump' | 'run'
SELECT to_tsquery('english', '!cat');
\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\-- !'cat'
SELECT to_tsquery('english', 'quick <-> brown');
\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\-- 'quick' <-> 'brown' (adjacency: quick followed by brown)
Basic Search
SELECT title, body
FROM articles
WHERE to_tsvector('english', body) @@ to_tsquery('english', 'database & performance');
Creating a Search Column
For production use, store the tsvector in a generated column to avoid repeated conversion:
ALTER TABLE articles
ADD COLUMN search_vector tsvector
GENERATED ALWAYS AS (
to_tsvector('english', coalesce(title, '') || ' ' || coalesce(body, ''))
) STORED;
CREATE INDEX idx_articles_search ON articles USING GIN (search_vector);
Now queries become:
SELECT title, body
FROM articles
WHERE search_vector @@ to_tsquery('english', 'postgresql & indexing');
Search Ranking
PostgreSQL offers ranking functions to sort results by relevance:
SELECT title,
ts_rank(search_vector, query) AS rank
FROM articles,
to_tsquery('english', 'postgresql & performance') AS query
WHERE search_vector @@ query
ORDER BY rank DESC
LIMIT 20;
ts_rank considers term frequency (TF) and inverse document frequency (IDF). A variant ts_rank_cd uses cover density, which rewards terms that appear close together:
SELECT title,
ts_rank_cd(search_vector, query) AS rank
FROM articles, to_tsquery('english', 'database & indexing') AS query
WHERE search_vector @@ query
ORDER BY rank DESC;
Highlighting
SELECT title,
ts_headline('english', body, query,
'StartSel=, StopSel=') AS highlighted
FROM articles, to_tsquery('english', 'performance & tuning') AS query
WHERE search_vector @@ query;
Multi-Column and Weighted Search
Assign different weights to columns for ranking:
ALTER TABLE articles ADD COLUMN search_weighted tsvector
GENERATED ALWAYS AS (
setweight(to_tsvector('english', coalesce(title, '')), 'A') ||
setweight(to_tsvector('english', coalesce(body, '')), 'B')
) STORED;
CREATE INDEX idx_articles_weighted ON articles USING GIN (search_weighted);
The weight function ts_rank can use these weights:
SELECT title,
ts_rank('{0.1, 0.2, 0.4, 1.0}', search_weighted, query) AS rank
FROM articles, to_tsquery('english', 'indexing') AS query
WHERE search_weighted @@ query
ORDER BY rank DESC;
Dictionaries and Custom Configurations
PostgreSQL supports multiple text search dictionaries:
-
english_stem: Snowball stemmer for English. -
simple: Lowercase conversion only. -
unaccent: Remove diacritics (requires extension). -
thesaurus: Synonym expansion.
Create a custom configuration:
CREATE TEXT SEARCH CONFIGURATION my_search (COPY = english);
CREATE TEXT SEARCH DICTIONARY my_thesaurus (
TEMPLATE = thesaurus,
DICTFILE = 'my_thesaurus.ths'
);
ALTER TEXT SEARCH CONFIGURATION my_search
ALTER MAPPING FOR asciiword
WITH my_thesaurus, english_stem;
SELECT to_tsvector('my_search', 'The database is performing well');
PostgreSQL vs Elasticsearch
| Aspect | PostgreSQL FTS | Elasticsearch | |--------|---------------|---------------| | Setup | Built-in, no extra service | Separate cluster, JVM | | Index freshness | Real-time within transaction | Near-real-time (refresh interval) | | Ranking | TF-IDF based | BM25, learning-to-rank | | Faceted search | GROUP BY with tsvector | Dedicated aggregation API | | Scale | Single node + replicas | Distributed by design | | Fuzzy search | pg_trgm extension | Built-in fuzzy matching | | Language support | 20+ languages via Snowball | 40+ language analyzers | | Query syntax | @@ tsquery operator | JSON-based Query DSL |
Performance Tuning
\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\-- Analyze GIN index usage
SELECT relname, seq_scan, idx_scan
FROM pg_stat_all_indexes
WHERE indexrelname = 'idx_articles_search';
\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\-- Track search index size
SELECT pg_size_pretty(pg_indexes_size('articles'));
For large datasets (millions of documents), consider:
- Partial indexes for recent documents if search is time-sensitive:
CREATE INDEX idx_recent_search ON articles USING GIN (search_vector)
WHERE published_at > now() - interval '30 days';
-
Partitioning the table by date and creating GIN indexes per partition.
-
Using
pg_trgmfor prefix/similarity matching as a complement to FTS:
CREATE INDEX idx_articles_title_trgm ON articles USING GIN (title gin_trgm_ops);
SELECT * FROM articles WHERE title % 'databas'; -- similarity search
PostgreSQL full-text search is adequate for the majority of applications: documentation sites, blog search, e-commerce product search, and knowledge bases. Consider dedicated search engines only when you need fuzzy matching, faceted aggregation across millions of documents, or distributed search at scale.
Enjoy this article? Share your thoughts, questions, or experiences in the comments below — your insights help other readers too.
Join the discussion ↓