educative.io

Doubt in the approach

In-line deduplication

Alternatively, deduplication hash calculations can be done in real-time as the clients are entering data on their device. If our system identifies a chunk that it has already stored, only a reference to the existing chunk will be added in the metadata, rather than a full copy of the chunk. This approach will give us optimal network and storage usage.

  1. If there is a change the user makes in a file : that chunk needs to stored right in the Block/Storage DB and also that chunk reference needs to updated in the metadata . Right ?
    So even if there is a small update , or a new file we need to post the chunk through the network .

Please do correct me if I am wrong .

In the In-line deduplication " If our system identifies a chunk that it has already stored, only a reference to the existing chunk will be added in the metadata" what does this mean ? I there is a change in the file I guess we have a new Hash value and hence we need to save and update the Storage DB and also the metadata DB . What is the “only a reference to the existing chunk will be added in the metadata” mean ?