Choosing between SQL and NoSQL databases is not explained which one to choose

For choosing between SQL and NoSQL databases to store the above schema, please see ‘Database schema’ under Designing Instagram.

Could you help in understanding if you would choose one why and what are the pros and cons

1 Like

since the author referring to Instagram design that means whatever they used for Instagram could also be used for Twitter. Think about it, Instagram people post photos/videos and text just like twitter so for photos/videos we can use object DB like S3 and wide column DS such as cassandra for the metadata of users and tweets and keeping all the relations between photos/videos and users or users=>users as follow relations.

Author correct me if I am wrong.

if I am not mistaken twitter uses NoSQL databases for:

  • Pregeneraded user timelines (Redis cluster)
  • Tweets (some key-value store)

And SQL database for:

  • Following relations (FlockDB, it is a graph database)

The advantage of that approach is that NoSQL databases are easy to scale horizontally so you can achieve higher write and read throughout, and bigger storage capacity without increasing complexity of the system.

To improve read throughput of SQL database you can use the master-slave architecture and read from replicas (slaves), and of course, in any case, you can uses caches to deal with the hot and the most recent tweets.

The challenge with the SQL approach is mainly in scaling writes and storage capacity. For that, you need to use sharding and sharding is tricky. You would need to deal with problems like slow joins between shards, rebalancing (when the load is too heavy for some machines) etc. On top of that most of RDBMS do not support foreign keys constraints across different servers so you would lose one of the most important advantages of the SQL databases.

Go to this page if you want to find out more about the twitter’s architecture: Detailed explanation of the desing