educative.io

Educative

Design Pastebin - Scaling Object Store (S3)

Hey everyone,

At the very end of the Pastebin design walkthrough, the author briefly mentions:

We can store our contents in an Object Storage like Amazon’s S3. Whenever we feel like hitting our full capacity on content storage, we can easily increase it by adding more servers.

As I understand it, they are proposing that you add additional object storage servers to provide more space; not to provide CPU or something like that. Meaning that the data is unique on each server; not replicating.

So how does your API server know what object storage server to look for to find the paste? Are we associating a server ID with the paste in the meta database? It seems weird to put a load balancer between the API server and the object store if the objects stores are not all the exact same.

1 Like

The author explained the S3 part very poorly. As far as I know AWS S3 is very easily managed. AWS is taking care of the availability of your storage so replication isn’t done by the user of S3.
And I also believe that the load balancer to S3 that they mentioned is unnecessary as it also being managed by AWS.
https://aws.amazon.com/s3/faqs/?nc=sn&loc=7

2 Likes

Yes indeed. there is no way to add additional S3 servers when we feel like we are hitting full capacity on content storage. In fact, S3 has no full capacity, you can just keep getting more, they’ll just charge you!
Also, the diagram of the caching is a bit off as you may decide to keep objects cached, but S3 wouldn’t know how to just add stuff to the cache, so the app server would have look in cache first, then hit S3, or whatever object store you were going for.