Please check the following statements

The backpropagation from output to hidden layer basically involves:

  1. Slope of the loss function i.e., Error which is equal to (outy - targety).
  2. Slope of the activation function, i.e., the sigmoid derivative of the node from where the error is coming from (output layer, in this case, i.e., outy(1 - outy).
  3. The value of the node that feeds into the weight, (i.e., hidden layer unit, outh).

based on the above statements the following pseudo code is written on page

l2_error = out_y - target_y
dw = l2_error . out_h
db = l2_error

My doubt : 2nd statement above and the 2nd statement in the pseudo code do not match i.e., while calculating the derivative of weight, sigmoid derivative is not taken into account and dw is just a product of error i.e. actual minus target and output of hidden layer which confuses me. And, this is not the case in backpropagation from hidden to input layer

@Javeria_Tariq could you explain my doubt please?

1 Like

Hi @Vikrant !!
Thanks for identifying this. Yes, the pseudocode provided is missing the application of the sigmoid derivative for computing the gradient of the weights. In backpropagation, the gradient of the weights connecting the output and hidden layers is computed by taking into account both the error and the derivative of the activation function.

To address this, we need to multiply the error by the derivative of the sigmoid activation function when computing the gradient of the weights. Here’s the revised pseudocode:

# Backpropagation from output to hidden layer

# Step 1: Compute the output layer error
l2_error = out_y - target_y

# Step 2: Compute the output layer slope (derivative of the activation function)
slope_output = out_y * (1 - out_y)

# Step 3: Compute hidden-to-output weights gradient
dw = l2_error * slope_output * out_h

# Step 4: Compute bias gradient for the output layer
db = l2_error * slope_output

By including the derivative of the sigmoid activation function (slope_output), we ensure that the gradients are appropriately adjusted during backpropagation, leading to effective weight updates and learning in the neural network.

Regarding your point about backpropagation from the hidden to the input layer, the process is similar. In that case, we’ll also need to compute the slope of the activation function for the hidden layer neurons and use it to calculate the gradients with respect to the hidden layer weights and biases.
We will update this soon. Thanks again !!

1 Like

Tariq, thanks for the prompt inputs and response. Could you also update the exercises and examples for the consistency please?
Backpropagation is a difficult subject to grasp however educative has provided a very good set of example and correcting this issue would help many other learners

Hi @Javeria_Tariq Thanks fore responding on the later queries related to the matrix multiplication and transpose.

Could you also comment on the anomaly listed in the this question please? I’ve written a detailed email to educative and pointed this scenario. while calculating the output derivative of sigmoid is not taken into consideration because of which the code segment is not correct. Similarly the derivative of relu is not taken into consideration in challenge its the reason challenge related to 3 layered network requires update. I request content team to look into assessment as well. Please provide your inputs



@Javeria_Tariq I’ve received responses from you for all my queries except the above one. I’ve posted my comments earlier as well to receive answers from you or educative team however there is no response.
You have an excellent course here on the Deep Learning and I am requesting inputs so that code is in sync with the pseudo code which you have corrected. Could you explain me the limitation please? Not only here however on the emails I’m not getting response for my emails. Unfortunately, the only route is the linkedin or twitter which I would like to avoid. I request your provide your inputs please.

Hi @Vikrant !!
Our sincere apologies for the delayed response to your query. You are absolutely correct - the correct expressions should involve the derivatives of the activation functions, just like in the revised pseudocode. We have now updated the quiz in the lesson accordingly.

Additionally, we have addressed the issue of computing the gradient of the weights connecting the output and hidden layers. The correct method now includes the derivative of the sigmoid activation function. The code has been updated to reflect this change.

Thank you for bringing these issues to our attention, and we appreciate your understanding as we continuously work to improve our materials and provide accurate information. If you have any more questions or concerns, please don’t hesitate to reach out to us.
Happy Learning