This post is lacking

This is simply not as well thought through as the other articles in this series. Please revisit it and update it.
Capacity estimation is all messed up.
Also the database has no explanation for why SQL was chosen. Some reasoning would be important here as NoSQL might make way more sense.


I totally agree. This is a poorly written article. capacity estimation for read/write has no connection with storage/bandwidth. fault tolerance explanation makes no senses at all. staleness for metadata storage using leader/follower approach lacks so much of explanation for a naive person who is trying to start with system design concepts. It doesn’t say why staleness is bad e.g. HA vs eventual consistency based on the use case. This article was given no love at all and looks someone wrote it with not a lot of time at hand or just didn’t know enough to write it.


Can’t agree more! When I saw the author gonna use MySQL to store metadata and user, I felt messed up with what I learned from those former articles.

I totally agree that this article does not match the others in terms of quality. I would love to revist this one once it has been looked at again. I agree that the use of SQL for the metadata and user data made little sense with the number of videos stored and the heavy read traffic.

Additionally, the capacity estimates seem way off. The 500 hours per minute figure does not align with the previous section. If we assume the average video is 5 minutes long then the figure is 1150 hours uploaded per minute. Now 5 minutes may be too large, but the current estimate assumes a little over 2 minute averages. Which may be reasonable, but that justification should be included explicitly.