Why use mysql for metadata and user data instead of nosql?


In design instagram problem, it says relational databases come with their challenges, especially when we need to scale them. So we choose nosql - cassandra for instagram design.

Youtube has videos and users similar to photos and users on instagram. Why don’t we choose nosql - cassandra for youtube?


I have the same exact question. NoSql makes more sense to me because there are billions of videos on youtube and scaling relational database is hard (vertical scaling) and it will reach its limitation. Whereas noSql database like Cassandra are specifically designed with Big Data in mind and they are easily scalable. We need to come up with the right schema for tables so that queries which we run on Cassandra are efficient. Cassandra is write optimized not read.

We can store user related information in Mysql but video metadata must be stored in Cassandra.


@Siddhant_Jawa - I have the same doubt but then I think If SQL DB is shared properly and most of the videos are served by the cache. Does it really matter wether we use SQL or NoSQL?


It does matter I think. When you shard the data, you have to index based on various criteria. Storing in MySQL will require you to distribute the data amongst Master and Slave resulting in Denormalization. I do think that NoSQL makes sense to me especially if I am storing metadata about the users such as comments, notes, tags


Well the fact is YouTube really uses MySQL for such purposes. Also , I think the nature of this data is highly structured. For likes count increase or decrease or dislikes, we need atomicity as well. Maybe those are reasons they chose MySQL. And it’s a myth that MySQL doesn’t scale.


From my opinion I think mysql is used to do a faster join on the two tables (comment and video) in order to get all the comments related to a particular video. I agree than Nosql is great for metadata especially if we make another table to demonstrate the relationship between video and comments to make things easier when it comes to pull all the comments for a particular video.


Overall in the entire series, I’m disappointed by the “sql vs nosql” questions. The main argument in all the tutorials was “nosql scales better”, which is a pretty weak argument, entire businesses revolve around selling mysql to companies. And then in this tutorial he chose SQL, no explanation …