educative.io

Sharding by userid or videoid

Since we have a huge number of new videos every day and our read load is extremely high, therefore, we need to distribute our data onto multiple machines so that we can perform read/write operations efficiently. We have many options to shard our data. Let’s go through different strategies of sharding this data one by one:

what’s the conclusion about sharding. should it be done by userid or videoid. the writeup does not give a final choice.

@Dewey_Munoz sharding by videoid is a better option than userid.

Reason: sharding by videoid solves the problem of popular users that occurs in sharding by userid…Although if a popular videos issue occurs in videoid approach then we have the solution for it that is introducing a cache.

How is this going to be efficient for video search? For example I want to search “funny dog videos”. Are we going to run a query like select * from videos where title like '%funny dog videos%' ?

for a keyword search query like this, it will go across all shards and match the keyword which would lead to scatter/gather scenario across shards. reason being, this keyword can be associated with any video.