In the ranking section of the feed-based-system segment, under the “Stacking Models and online learning”, the author suggests a Neural Network can be used for this stacking technique by feeding the hidden layer into a logistic regression model.
Similarly, for neutral networks, rather than predicting the probability of events, you can just plug-in the output of the last hidden layer as features into the logistic regression models.
Neural Networks are essentially stacked logistic regression models (or equivalents depending on whether the activation layer is sigmoid or something else like tanh or relu). Removing the last layer which predicts a single value, and replacing it with a logistic regression layer that predicts a single value seems to leave you with essentially the same thing?
Is this “online learning” concept mean that if the NN has L layers, you would train all L layers offline, then freeze L-1 and retrain the last layer online?