What parameter should be used for partitioning key in case if we choose document partitioning?

Deepchand_Swami · March 12, 2024, 2:01am

What parameter should be used for partitioning key in case if we choose document partitioning so that we can equally distribute the load to across nodes ?

Ali_Hassan · March 15, 2024, 10:27am

Hi Deepchand,

The parameter to be used as a partitioning key can vary depending on the different scenarios, as follows:

Document ID: If the document ID provides a high cardinality and even distribution.
Timestamp: For time-series data where the query pattern typically involves a date range.
User ID or Tenant ID: This can be useful in multi-tenant systems where data locality for each tenant is essential, but ensure this doesn’t lead to uneven distribution.
Geographic Location: For location-based services, partitioning based on location or region can be beneficial for localized queries.
The hash of a Field: A hash function can be applied to a certain field that results in a high cardinality to provide an even distribution. This is common when no single field naturally provides a good partition key.

When a suitable primary partitioning key is hard to determine, a composite key made from multiple fields can be considered. This can help to improve distribution and align with complex query patterns

Thank you.