educative.io

Why store pastebin contents in object-storage (S3) instead of the RDBMS table?

The size of a pastebin will be less than or equal to 10MB, it said. That is perfectly storable as a BLOB in any RDBMS. So, it could be retrieved as a column in the Paste table. Why, then does this book recommend using object-storage? It gives no explanation for this choice. It does say “This division of data will also allow us to scale them individually.” But why would you want to scale the Paste table separately from the object table? When one grows, the other grows, because they both grow whenever a new paste is submitted. How does this justify bearing the extra latency of contacting two separate DB sequentially (not even i parallel)? It can’t be done in parallel because you need content_key before you contact the object storage service.

1 Like

Great point, my intuition is that even though they grow together, i.e. metadata and actual data tables, NoSQL storage grows much faster, ~100x or even more.

Another point is that scaling RDBMS to distributed scenarios is challenging, generally vertically scaled. But NoSQL is horizontally scalable. Also the following links is worth checking out