educative.io

File Storage in Chunks

Hi!! It has been mentioned in the design considerations to store the files in chunks so that it becomes easy to upload/download only the chunks or parts of the file which has changed.
However, it needs some clarity how to store files in chunks. Although there are existing NoSql databases such as MongoDb and Cassandra which has inbuilt mechanisms to store files in chunks, can someone provide a basic explanation of how actually the chunks are stored internally?

Hi @Shivam_Saxena, hope you are doing well.
As far as I understand the chunks are not stored in the Database. If you think about, a file is just a sequence of bytes, right? So the idea here is, we get a file and split it in chunks of, let’s say, 100 bytes. We gonna store in our metadata db that file X, which has 500 bytes, is composed by the chunk A (0-99), B(100-199), C(200-299),… If something changes in this file X, we can check where the change was done (in which chunk) and send just the changed chunk to our remote servers (let’s say, chunk C). The devices will download this 100 bytes chunk and update the metadata db.
Summarizing, the chunks are, actually the file it self. The metadata db is just way to keep tracking the changes and, also, we could implement other cool mechanisms of backup (we can store the changes history, for example, and keep the last K chunks of a file and if needed rebuild the file with the chunks, also stored locally). Hope this helps!

Artur Baruchi

1 Like

How would we know which chunk was edited?