Skip to content

Shard Keys & Balancing

Shard Keys & Balancing

Selecting an optimal shard key is the single most important decision when designing a sharded cluster. A poorly chosen shard key can lead to hotspots, uneven data distribution, and irreversible performance degradation.

🏗️ 1. Selecting the Shard Key

The Shard Key is an indexed field or fields that MongoDB uses to distribute a collection’s data across the cluster.

Characteristics of an Ideal Shard Key

CharacteristicDescription
High CardinalityThe key should have a large number of unique values (e.g., user_id is better than country).
Even DistributionThe key should prevent “hotspots” where a single shard receives most of the writes.
Query Pattern AlignmentThe shard key should be a field frequently used in your queries to allow the mongos to route requests to specific shards.

Compound Shard Keys

In many cases, a single field is not sufficient. You can use a compound shard key to achieve both cardinality and alignment.

// Example: Compound shard key using user_id and timestamp
sh.shardCollection("app.activity", { "user_id": 1, "ts": 1 });

🚀 2. Chunks and Migrations

MongoDB splits data into Chunks based on the shard key’s value.

  • Default Size: Chunks are 64MB by default.
  • Splitting: When a chunk grows beyond its size limit, MongoDB automatically splits it into two smaller chunks.
  • Migration: If the number of chunks on one shard becomes significantly higher than on others, MongoDB will migrate chunks to rebalance the cluster.

⚡ 3. The Balancer

The Balancer is a background process that ensures an even distribution of chunks across all shards in the cluster.

  • Operation: When the balancer detects an imbalance, it moves chunks from the shard with the most chunks to the shard with the fewest.
  • Automatic: The balancer is enabled by default.

Managing the Balancer

You can schedule the balancer to run during specific windows to avoid impacting peak performance.

// Pymongo Example: Checking and managing the balancer
from pymongo import MongoClient

client = MongoClient('mongodb://mongos:27017/')
config_db = client['config']

# Check if the balancer is running
balancer_status = config_db.settings.find_one({"_id": "balancer"})
print(f"Balancer Status: {balancer_status}")

# Stop the balancer
client.admin.command("balancerStop")

🛡️ 4. Preventing Hotspots

A Hotspot occurs when a single shard receives a disproportionate amount of write traffic.

The “Monotonic” Trap

Using a monotonically increasing field (like an ObjectId or a timestamp) as a range-based shard key is a common mistake.

  • Impact: Every new document will have a higher key value, meaning all writes will hit the same shard (the one holding the upper range).
  • Solution: Use Hashed Sharding for monotonically increasing keys to distribute writes evenly across the cluster.
// Mongo Shell: Create a hashed shard key
sh.shardCollection("logs.events", { "timestamp": "hashed" });

💡 Best Practices

  1. Include Shard Key in Queries: If your query does not include the shard key, the mongos must broadcast the request to all shards, which is highly inefficient (“Scatter-Gather” query).
  2. Monitor Disk Space: Ensure all shards have comparable disk space and performance profiles.
  3. Use Compound Keys for Scale: If you need to shard by a low-cardinality field (like region), combine it with a high-cardinality field (like order_id).