What is Sharding?

Database Sharding: Strategies and Trade-offs

Sharding splits a database across multiple servers horizontally. Each shard holds a subset of data, allowing linear scalability.

Key-Based Sharding

Hash the shard key to determine the target shard:

class KeyBasedShardManager:

def init(self, num_shards=4):

self.num_shards = num_shards

self.shards = [Shard(i) for i in range(num_shards)]

def get_shard(self, shard_key):

hash_val = int(hashlib.sha256(str(shard_key).encode()).hexdigest(), 16)

shard_id = hash_val % self.num_shards

return self.shards[shard_id]

Range-Based Sharding

Partition by value ranges:

CREATE TABLE orders (

id BIGSERIAL, order_date DATE, total DECIMAL(10,2),

PRIMARY KEY (id, order_date)

) PARTITION BY RANGE (order_date);

CREATE TABLE orders_2026_01 PARTITION OF orders

FOR VALUES FROM ('2026-01-01') TO ('2026-02-01');

Directory-Based Sharding

Use a lookup table for shard mapping:

class DirectoryShardManager:

def init(self):

self.directory = {}

def map_key_to_shard(self, shard_key, shard_id):

self.directory[shard_key] = shard_id

def get_shard(self, shard_key):

return self.directory.get(shard_key)

Rebalancing

When adding or removing shards, data must be redistributed. Use consistent hashing to minimize data movement. Tools like Vitess and Citus automate this process.

Conclusion

Choose key-based sharding for even distribution, range-based for time-series data, and directory-based for maximum flexibility. Design shard keys carefully for even distribution. Plan for rebalancing from the start. Avoid cross-shard queries where possible.