Typo in Adagrad equation, and mismatch with code sample

Akshay · July 31, 2022, 5:24pm

The written equation for adagrad is w = w - learning_rate/(sqrt(G)+epsilon) x (gradient of cost function)^2.

In the provided implementation, and in various other texts, we don’t square the gradient of the cost function. Is this a typo, or am I misunderstanding something?

Course: https://www.educative.io/collection/6106336682049536/5913266013339648
Lesson: https://www.educative.io/collection/page/6106336682049536/5913266013339648/5494261663924224

Shahroz_Ajmal · August 24, 2022, 6:42am

Hi @Akshay, In Adagrad implementation, the total line of code is 4.
In line 2, we calculate the gradient. Then we calculate the square of the gradient in line 3 squared_gradients +=dw*dw.