Shard Keys & Balancing
Shard Keys & Balancing
Selecting an optimal shard key is the single most important decision when designing a sharded cluster. A poorly chosen shard key can lead to hotspots, uneven data distribution, and irreversible performance degradation.
🏗️ 1. Selecting the Shard Key
The Shard Key is an indexed field or fields that MongoDB uses to distribute a collection’s data across the cluster.
Characteristics of an Ideal Shard Key
| Characteristic | Description |
|---|---|
| High Cardinality | The key should have a large number of unique values (e.g., user_id is better than country). |
| Even Distribution | The key should prevent “hotspots” where a single shard receives most of the writes. |
| Query Pattern Alignment | The shard key should be a field frequently used in your queries to allow the mongos to route requests to specific shards. |
Compound Shard Keys
In many cases, a single field is not sufficient. You can use a compound shard key to achieve both cardinality and alignment.
// Example: Compound shard key using user_id and timestamp
sh.shardCollection("app.activity", { "user_id": 1, "ts": 1 });🚀 2. Chunks and Migrations
MongoDB splits data into Chunks based on the shard key’s value.
- Default Size: Chunks are 64MB by default.
- Splitting: When a chunk grows beyond its size limit, MongoDB automatically splits it into two smaller chunks.
- Migration: If the number of chunks on one shard becomes significantly higher than on others, MongoDB will migrate chunks to rebalance the cluster.
⚡ 3. The Balancer
The Balancer is a background process that ensures an even distribution of chunks across all shards in the cluster.
- Operation: When the balancer detects an imbalance, it moves chunks from the shard with the most chunks to the shard with the fewest.
- Automatic: The balancer is enabled by default.
Managing the Balancer
You can schedule the balancer to run during specific windows to avoid impacting peak performance.
// Pymongo Example: Checking and managing the balancer
from pymongo import MongoClient
client = MongoClient('mongodb://mongos:27017/')
config_db = client['config']
# Check if the balancer is running
balancer_status = config_db.settings.find_one({"_id": "balancer"})
print(f"Balancer Status: {balancer_status}")
# Stop the balancer
client.admin.command("balancerStop")🛡️ 4. Preventing Hotspots
A Hotspot occurs when a single shard receives a disproportionate amount of write traffic.
The “Monotonic” Trap
Using a monotonically increasing field (like an ObjectId or a timestamp) as a range-based shard key is a common mistake.
- Impact: Every new document will have a higher key value, meaning all writes will hit the same shard (the one holding the upper range).
- Solution: Use Hashed Sharding for monotonically increasing keys to distribute writes evenly across the cluster.
// Mongo Shell: Create a hashed shard key
sh.shardCollection("logs.events", { "timestamp": "hashed" });💡 Best Practices
- Include Shard Key in Queries: If your query does not include the shard key, the
mongosmust broadcast the request to all shards, which is highly inefficient (“Scatter-Gather” query). - Monitor Disk Space: Ensure all shards have comparable disk space and performance profiles.
- Use Compound Keys for Scale: If you need to shard by a low-cardinality field (like
region), combine it with a high-cardinality field (likeorder_id).