educative.io

Educative

Does pre generated news feed contain photos and videos as well?

  1. Where do we keep UserNewsFeed table in the system? Is it on the cache server? or a DB like Cassandra? what is the location of this in the System diagram?
  2. In facebook news feed you have mentioned that we use a linked hash map of <feeditemid, feeditem>. What is the difference between usernewsfeed table and this approach?
  3. Does FeedItem mean actual media content like the image or is it just metadata? If it is metadata then when do we pull/push the images/videos associated with the client?
  4. In the push model, it is mentioned that we push new posts to followers? does it mean media content is pushed to follower machine or just a notification is sent to user and news feed is updated in cache?
  5. What is the type of cache the data stored in file storage? Is it always Memcached or Redis?
  6. How does CDN come into the picture for static data like Videos and photos? when do we refer to CDN in the request-response life cycle? do we generate news feed and refer to CDN to get the contents?
  7. Do we just use CDN to stream videos only?
1 Like

Hi,

‘UserNewsFeed’ should be stored in DB; this will enable the system to not lose the news feed if the cache server is lost. The newsfeed can be put in memory (or cached) for faster access. The ‘UserNewsFeed’ will only store the feed item IDs, the caches servers can load/store the actual feed items (with videos/photos) from the respective tables. The LinkedHashMap approach discussed in the Facebook newsfeed is the way to store the newsfeed in the memory, i.e., after loading from the DB that is how we can keep it in memory.

In the push model, we can push either the new items or just a notification to let the user know if there are new items, and users can then request/pull from the server. If we push the new items, the client will only need to refresh itself as the data has already been received from the server.

We can use any cache server Memcached or Redis.

CDNs can be used for faster access of any type of data (like a post, photo, video, file, etc). Imagine a celebrity user posts a video, this video can be pushed to CDN so that it becomes available to users and can be quickly streamed from multiple nearby locations.

Hope this answers your questions.

3 Likes

Hi Design Gurus,

Thanks a lot for answering all the questions. Can you please answer the below questions as well?

In twitter feed generation don’t we put videos also in cache? If yes as per calculations there we have 24tb data per day. 3 days tweets will have 75tb(20% of it 15tb). If we put videos content in the cache as part of pre-generated feed we may need 100 cache servers of capacity 150 GB RAM each. Is it feasible to do this in terms of cost?

As explained in the caching section of Twitter design, if we store the text of the tweets in the cache, in this case, we need 100 GB cache. Where do we store metadata information of videos/photos twitter feed generation process?
In this scenario how do we access the videos/photos? Do we access from cache servers in front of file storage when the user pulls the feed? or do we keep media in CDN and generated feed has reference links to CDN content?

Yes we can put videos/photos in the cache and if we want to store three days of cache data then we do need 100 servers which is not a bigger number by the way! How many servers do you think Twitter has to serve one billion users? Answer: It is in thousands, distributed all over the globe.

Streaming from CDN is a complete design question. For details take a look at how Netflix stream videos: https://openconnect.netflix.com/en/

The Akamai paper is one of the best to know internal details of one of the largest CDN in the world: https://www.cs.rutgers.edu/~rmartin/teaching/fall15/papers/arch2/cdn.pdf

2 Likes

Hi Design Gurus,

When an user fetches newsfeed, the server perform the following steps:

  1. Goes to the UserNewsFeed table in the DB to get the list of feed ids
  2. Uses those feed ids to get the contents from the cache servers.
  3. Ranks those feeds and then sends back to the user.

When is LinkedHash used in this process?