The written equation for adagrad is w = w - learning_rate/(sqrt(G)+epsilon) x (gradient of cost function)^2.
In the provided implementation, and in various other texts, we don’t square the gradient of the cost function. Is this a typo, or am I misunderstanding something?
Course: https://www.educative.io/collection/6106336682049536/5913266013339648
Lesson: https://www.educative.io/collection/page/6106336682049536/5913266013339648/5494261663924224