We didn’t really propose to use Cassandra, instead, we presented a few solutions. Lets first go through them and then see if any specific solution stands out:
-
First we presented an RDBMS approach (e.g., MySQL). All metadata goes in the tables (schema is given) and the actual photos either go in a distributed storage like HDFS or cloud like S3. Since we do need joins, using RDBMS seems like an easy choice but all these DBs come with their challenges like when we need to scale or reliability (compare it to Cassandra which does quorum-based read/write and hence their reliability/performance is measured differently). Having said this, these issues of RDBMSs are manageable (though difficult specially when we are building a global service like Instagram), for example, Facebook is storing most of their social graph in MySQL and they have scaled it well. Probably Facebook has the world’s largest MySQL deployment.
-
Any NoSQL can work too. But we will need to store the relations (or “joins”) too. For example, to find “Followers”, we will need to store “follower” and “followee” in a key-value pair. This could be any NoSQL like Redis, Amazon’s DynamoDB, etc. Here are the top key-value data-stores: https://db-engines.com/en/ranking/key-value+store
-
Cassandra could be a qood fit here. For example, we can store all the “Followers” in separate columns for a “Followee”. A column store, in our case, will give good performance but it has less flexibility (ref:https://en.wikipedia.org/wiki/NoSQL#Performance). For example, Facebook (the original developer of Cassandra) has nearly stopped using Cassandra because of its complexity and flexibility. Facebook has developed their own key-value store, it is called ZippyDB (find their presentation here: https://engineering.fb.com/core-data/inside-data-scale-2015/)
First thing first, although we have not given a clear answer, the above discussion is very relevant for a System Design Interview. You should focus on this, it is very important! In an interview, presenting different options and knowing their trade-offs is quite important.
Finally, it looks like Cassandra or a simpler key-value store could solve our problem efficiently. But, hey, Facebook is storing their social graph in an RDBMS!