educative.io

How does sharding by user id avoids hot video problem?

Hi @Design_Gurus,

In the sharding part,you mention that if we shard by video id then : "This approach
solves our problem of popular users but shifts it to popular videos. " , but popular video problem will be there in userid based sharding also, if a user video become popular, then since user videos resides on on server, it will be bombarded. So this problem exists in both, but from your statement it seems that userid based shading dont have this problem?

@rahul9 This article definitely is lacking and your question is valid. Typically there will be 2 types of userIds for a system such as netflix/youtube. userid for the content provider which will have some sort of association with the videoId (video that they upload) and userid for the content consumer who will have an indirect association with the videoId e.g. the videos that the user is currently watching etc.
And then for video uploads, you would want to design your system in a decoupled way where you would have services like upload service that will work towards uploading user content. this service could talk to a video service that would work towards storing video content in object storage and also in CDNs. the communication between the two could be optimized (beyond the scope of my write up here) and then a post processing service.
there would be tables for videos, content providing users, association between the two, content consumer etc. sharing has to be done separately for each table based on what type of queries will be executed the most.
This article really doesn’t do any due diligence in differentiating any of the above and hence, causes all the confusion.