How to partition data across shards.

·

2 min read

There comes a point where a single database server is no longer sufficient for the huge volume of transactions and storage of data. This is when partitioning and data sharding comes into play. This ensures improved availability and scalability. Assume that you want to partition the data across 3 shards. All the existing and incoming data should now reside on one of these shards. How then is a shard chosen for each user?

Image

Three strategies can be used when choosing a database shard:

  • Lookup strategy.

  • Range strategy.

  • Hash strategy.

Lookup strategy.

This is also known as Key-value-based partitioning. Each shard is mapped to some unique key in the data. It allows for more efficient data retrieval based on key value. For instance, the userID of your system is mapped to one of these shards.

Image

When a request arrives, you identify the shard using the mapping service(metadata server) and route the request to the identified shard server. The metadata server may itself get huge over time and need to be partitioned.

Image

Range strategy.

In this strategy, additional components are not needed to be maintained. Range based partitioning approach is more suited for the system that deals with frequent range queries, for instance, getting all returned orders for the last month.

Image

One major problem with the range-based strategy is the hot shard problem. This occurs when data from one range exceeds the data in other ranges by far. This might lead to the hot shard problem and with time the data might get less evenly distributed over time.

Hash Strategy.

This strategy is quite similar to the range strategy. Each shard is assigned to some range of hashes, and no additional components are needed. When requests arrive, they are run through some hash function to identify the hash number and map the request to the corresponding shard.

Image

Consistency hashing is a famous algorithm of this type. Hash-based partitioning offers more even data distribution compared to the other two strategies.

Thank you for your time and see you in the next.

Special shoutout to @happydecoder at X(Formerly Twitter).