Hi Team, Could you provide me the generic guidelines so that I can avoid dim error while multiplying and transposing the matrices.
My following code is based on the “Train the XOR Multilayer Perceptron” in previous chapter to maintain consistency.
However the solution you’ve provided for this challenge is little different .My solution below , please also see the comments i.e., how this code is different from your solution
def backpropagation(y, out_y, out_h2, out_h1, w3, w2, x):
“”"
Computes the backpropagation operation for the
3-layered neural network and returns the gradients
of weights and biases
“”"
l3_error = out_y - y
dw3 = np.dot(l3_error, out_h2.T) # dw3 = np.dot(out_h2.T, l3_error), this is provided in your solution
db3 = np.sum(l3_error, axis=1, keepdims=True) #In your code axis =0
dh_2 = np.dot(w3.T, l3_error) # dh2 = np.dot(w3, l3_error) , this is provided in your solution
l2_error = np.multiply(dh_2, out_h2 * (1 -out_h2)) # l2_error = np.multiply(dh2.T, out_h2 * (1 - out_h2)) , this is provided in your solution
dw2 = np.dot(l2_error, out_h1.T) # dW2 = np.dot(out_h1.T, l2_error) , this is provided in your solution
db2 = np.sum(l2_error, axis=1, keepdims=True)
dh_1 = np.dot(w2.T, l2_error) # dh1 = np.dot(w2, l2_error.T), this is provided in your solution
l1_error = np.multiply(dh_1, out_h1 * (1-out_h1)) # l1_error = np.multiply(dh1.T, out_h1 * (1 - out_h1)) as per your solution
dw1 = np.dot(l1_error, x.T) # np.dot(x.T, l1_error), as per your solution
db1 = np.sum(l1_error, axis=1, keepdims=True)
return dW3, db3, dW2, db2, dW1, db1
Above solution follows the below example from previous chapter in which dot products were performed for the backward propagation different way i.e., dot product and transposing of matrices
It is very tedious to visualize the shapes or remember the order while writing the code.
If you see the conceptual logic then you would find my logic was correct however the transposing of matrices and dot product order was different because of which I got stuck
Question1 or Doubt1 : I need inputs so that I can figure out these errors while writing the logic otherwise it is going to be trial and error. Could you assist ?
Question2 why the pattern which is written in the below code was not followed ?
Question 3: While resolving the challenge how you have determined the dot products and transpose of matrices swiftly?
Question 4: Are there any guidelines or best practices to figure this out ?
Ideally above code should be an augment of the code from the previous chapter which is not the case
Finally I’m copying code from the previous chapter below:
Train the XOR Multilayer Perceptron
def backward_propagation(X, Y, out_h, out_y, W2):
“”"
Computes the backpropagation operation of a neural network and
returns the derivative of weights and biases
“”"
l2_error = out_y - Y # actual - target
dW2 = np.dot(l2_error, out_h.T) # derivative of layer 2 weights is the dot product of error at layer 2 and hidden layer output
db2 = np.sum(l2_error, axis = 1, keepdims=True) # derivative of layer 2 bias is simply the error at layer 2
dh = np.dot(W2.T, l2_error) # compute dot product of weights in layer 2 with error at layer 2
l1_error = np.multiply(dh, out_h * (1 - out_h)) # compute layer 1 error
dW1 = np.dot(l1_error, X.T) # derivative of layer 2 weights is the dot product of error at layer 1 and input
db1 = np.sum(l1_error, axis=1, keepdims=True) # derivative of layer 1 bias is simply the error at layer 1
return dW1, db1, dW2, db2 # return the derivatives of parameters