One challenge I have been facing when reading educative dot io courses is that, sometimes, the most important step (from my perspective) is missing. For example,
In this course, we are trying to “Using sharded counters for the Top K problem”. I was able to understand how we increment/decrement (write) the counter to Cassandra. But how are we gonna get the local Top-K trends? Is Cassandra able to quickly find the Top-K by just one query? (I really doubt, because there are so many counters). If Cassandra can do it, then what’s the data modeling? i.e., what’s the row key, and what’s the column key? how does the query work?
The following is all that mentions Top-k, which is not useful at all:
When users generate a timeline, read requests are forwarded to the nearest servers, and then the persisted values in the store can be used to respond. This storage also helps to show the region-wise Top K trends. The list of local Top K trends is sent to the application server, and then the application server sorts all the lists to make a list of global Top K trends. Eventually, the application server sends all counters’ details to the cache.
Course: Grokking Modern System Design Interview for Engineers & Managers - Learn Interactively
Lesson: Detailed Design of Sharded Counters - Grokking Modern System Design Interview for Engineers & Managers