https://www.educative.io/courses/beginners-guide-to-deep-learning/7DrjBDrVB0O

Vikrant · July 8, 2023, 5:03pm

Why db = np.sum(Error) ? # However dE/dbi = y- ypred but here in the statement error is ypred - y. The calculation for updating the bias is written in a reverse way. This is for Gradient Descent: The Batch Update. Coding the perceptron training rule#. Precise details are db = np.sum(Error) # Compute derivative of error w.r.t bias

However, while calculating the Stochastic Gradient Descent: The Stochastic Update the formula is used correctly

Line details are line #24 i.e., db = target - actual # gradient of bias

Could you explain this ?

Javeria_Tariq · July 10, 2023, 5:27am

Hi @Vikrant !!
In the provided code, the calculation db = np.sum(Error) corresponds to the derivative of the error with respect to the bias in the gradient calculation for the batch update in the perceptron algorithm.

In the context of the perceptron algorithm, the Error variable represents the difference between the predicted values (Y_predicted) and the target values (Y). This difference is often referred to as the error or the residual.

When calculating the derivative of the error with respect to the bias (db), the correct formulation should be db = np.sum(Error). Here’s why:

The derivative of the error with respect to the bias (db) can be calculated by considering the chain rule of calculus. Since the bias is not multiplied by any input feature, its derivative is simply the sum of the error terms.
In the provided code, the Error variable represents Y_predicted - Y, which matches the definition of the error in the perceptron algorithm.
By summing up the error terms, you obtain the derivative of the error with respect to the bias.

Regarding your mention of the Stochastic Gradient Descent (SGD), it’s important to note that the formula db = target - actual you mentioned is not a general rule for updating the bias in the SGD algorithm. In the provided code, the calculation db = np.sum(Error) is appropriate for the batch update version of the perceptron algorithm but not for the SGD algorithm.

In SGD, the gradient update is performed iteratively on individual training examples, and the update rule for the bias depends on the specific implementation. It could be calculated differently, such as db = Error or db = learning_rate * Error. The choice of the update rule depends on the particular SGD implementation and the objective of the algorithm.

Therefore, in the context of the provided code, the calculation db = np.sum(Error) is correct for the batch update in the perceptron algorithm, but it may not be applicable to other algorithms or different versions of gradient descent.
I hope it helps. Happy Learning