Thank you for offering this great course.
I am really stuck with the use of the Reshape into the solution of implementing Batch Normalization manually.
I have a problem understanding the offered solution here.
- Why are the real shapes of Gamma and Beta are like the channel dimension?
- Why did we reshape the mean like this:
variance = torch.mean((X - mean.reshape((1, C, 1, 1))) ** 2, dim=(0, 2, 3))