educative.io

Educative

Storage and Bandwidth Estimation?

Storage Estimates:“Let’s assume that every minute 500 hours worth of videos are uploaded to Youtube. If on average, one minute of video needs 50MB of storage.
500 hours * 60 min * 50MB => 1500 GB/min (25 GB/sec)”

  1. Why do we estimate it 500 hours every min? I know it make sense when we estimate millions user. But how to make the conclusion to 500 hours every min?

Bandwidth estimates: "With 500 hours of video uploads per minute and assuming each video upload takes a bandwidth of 10MB/min, we would be getting 300GB of uploads every minute.

500 hours * 60 mins * 10MB => 300GB/min (5GB/sec)"

  1. Can we just use storage 25GB/sec as bandwidth? Why do we do estimate bandwidth again and assume 10MB/min and get 5GB/sec?

Does anyone have any ideas?

6 Likes

Here are my thoughts about how to estimate the storage and bandwidth.
Assuming we have 800M daily active users, each watches 5 videos and read/write ratio is 1:200 so we have 800M * 5/200 = 20M videos are uploaded daily. Assuming an average length of a video is 4 min (I recall reading somewhere it is 4:20 min) and one minute of video needs 50MB of storage we get 20M * 4 min * 50MB => 80 * 10^6 * 50 *10^6 = 4000 * 10^12 = 4000 TB per day

Bandwidth estimation: With 4000 TB of uploaded data a day, the uplink is about ~47.3 GB/sec and the downlink is ~9.5 TB/sec. Obviously, the traffic is not evenly distributed along the day, so we should take this into consideration.

4 Likes

Something is not right with the bandwidth calculation.

The 500 is hours of video per minute.
The 10MB/min is the bandwidth per video.

We would need to know the number of minutes per video (at least the average) to get the bandwidth requirement.

1 Like

You can think storage estimate as how much data is begin generated per sec, for e.g. 25GB/sec. But this much large data can’t be consumed by the network straight away as network will have bandwidth limitation. That’s why we need processing queue to temporarily store all the videos.

Bandwidth estimate should take into account upload/download speed, which is mentioned as 10MB/min in this lecture.

2 Likes

I also have question about the " every minute 500 hours worth of videos" and the bandwidth calculation.

My conclusion is with ir without the bandwidth limitation in mind, we will both get the right answer.

If we assume 230 videos/sec (stated in the lecture) uploads, and then we can assume 5min/video and 50M/minute. The we will have
230(videos’/sec’) * 60(sec/min) * (5min/video) * (50M/min) = 3369 G/min = 56G/s.

As we know the video may not be evenly distributed across the day, the number may be much larger or smaller.

But the key thing to notice is, we may not finish uploading the whole video in a minute despite its size with the bandwidth limit for client to upload anything. So we assume a 10M/min upload bandwidth. In turns, this will affect the server’s ingress bandwidth.

So let’s analyze it with the bandwidth limitation in mind:

Each video is (5min/video) * (50M/min) 250 M in size on average, and the bandwith is 10M/min. Then we need 25 min to finish the upload of a video.

Each minute we have 230*60 = 13,800 videos

We have 13,800 video are being uploaded per minute, that is 13,800*10M = 134.7G.
This is the bandwith we need to start uploading these 13,800 videos and we need to keep the upload for 25 mintues.

And we can see from the graph below, each moment we will have new connections for new video upload and old connections disconnect, and the bandwidth need to be mutiply. And the cycle is 25 minutes.

So, we also need to multiply 25 to the bandwitdth.

134.7G*25 = 3369 G/min

This is exact the same as the analysis we have at the begining…

For simplicity, we have a cycle of 4 instead but the pattern is the same.
indent preformatted text by 4 spaces

----|
 ---|-
  --|--
   -|---
    |----
    | ----