educative.io

Educative

Why use mysql for metadata and user data instead of nosql?

In design instagram problem, it says relational databases come with their challenges, especially when we need to scale them. So we choose nosql - cassandra for instagram design.

Youtube has videos and users similar to photos and users on instagram. Why don’t we choose nosql - cassandra for youtube?

1 Like

I have the same exact question. NoSql makes more sense to me because there are billions of videos on youtube and scaling relational database is hard (vertical scaling) and it will reach its limitation. Whereas noSql database like Cassandra are specifically designed with Big Data in mind and they are easily scalable. We need to come up with the right schema for tables so that queries which we run on Cassandra are efficient. Cassandra is write optimized not read.

We can store user related information in Mysql but video metadata must be stored in Cassandra.

1 Like

@Siddhant_Jawa - I have the same doubt but then I think If SQL DB is shared properly and most of the videos are served by the cache. Does it really matter wether we use SQL or NoSQL?

It does matter I think. When you shard the data, you have to index based on various criteria. Storing in MySQL will require you to distribute the data amongst Master and Slave resulting in Denormalization. I do think that NoSQL makes sense to me especially if I am storing metadata about the users such as comments, notes, tags

Well the fact is YouTube really uses MySQL for such purposes. Also , I think the nature of this data is highly structured. For likes count increase or decrease or dislikes, we need atomicity as well. Maybe those are reasons they chose MySQL. And it’s a myth that MySQL doesn’t scale.

2 Likes

From my opinion I think mysql is used to do a faster join on the two tables (comment and video) in order to get all the comments related to a particular video. I agree than Nosql is great for metadata especially if we make another table to demonstrate the relationship between video and comments to make things easier when it comes to pull all the comments for a particular video.

Overall in the entire series, I’m disappointed by the “sql vs nosql” questions. The main argument in all the tutorials was “nosql scales better”, which is a pretty weak argument, entire businesses revolve around selling mysql to companies. And then in this tutorial he chose SQL, no explanation …

6 Likes

I am really not convinced about this entire series for “when to use which technology”. Because in many cases i have seen the purpose of using that technology is not clear. They should have put more details like why this MySql was chosen based on which criteria. Its not always true that SQL is not better for scalable. Only the question comes which DB should we use for which business scenarios. For example Here for storing Metadata and user information we should answer this question:

  • Schema Might change in future a lot ? - Yes (NoSQL) | No (SQL)
  • Do we care about the constituency about the likes and dislikes? - Yes (SQL) | No (NoSQL)
  • Do we have lot of relations in our DB: Yes (SQL) | No (NoSQL)
  • Are we going to do lots of Joins operation on data : Yes(SQL) | No (NoSQL)

Something like this matrix will help to decide and judge the technology.

9 Likes

I felt the same.
Also I found that most of such tutorials are very much influenced by the current design of a particular product.
It is possible that when YouTube was initially developed, NoSQL DBs were not so stable, so they chose MySQL. Probably if Youtube is developed now, they may choose NoSQL db.

The fact is Youtube started with MySQL. It kept using it for the required ACID properties. So they did not want to shift to NoSQL. They use a clustering system called Vitess, which is managing the scaling and topology of their MySQL cluster (surprise!! MySQL does scale!! it just needs external help to do so). It runs on the Google Cloud Platform.

A lot of content is served by Google’s CDN so it reduces the stress on the MySQL clusters. The MySQL clusters are only hit if the CDN doesn’t have the required data. As popular videos and their corresponding metadata(which would have caused the most load on the DB) are viewed many times, the CDN stores them, vis-a-vis …the cluster is not hit for more than a couple times.

2 Likes

I am guessing Youtube indeed uses mysql, and BigTable. Even though the reasons choosing such technology is not sound, but hey who can criticize one of the most successful business in the world. The authors of this series probably divide the task for each design tutorial and the solutions are not consistent with one another, some is more based on experiences rather than reason.

I am disappointed about this Design Youtube and Netflix one, poorly explained. Why encode to different format? Is Netflix design even covered? Netflix and youtube are very different. Why use BigTable? So many questions.

2 Likes