In Follow up questions, why reducing the sensitivity of click through rate will help with the issue of user's behavior changes?

For the first follow-up question: How do we adapt to user behavior changing over time? In the answer, the third point is Use different loss functions to be less sensitive with click through rates. What I don’t understand is what is the relationship between the click through rate and the user’s behavior changes, and why the less sensitivity to click through rate will help to adapt to the behavior changing. Also, what does the user behavior refer to? Does it refer to user’s preference or just refer to the user’s video watching behavior like what time to watch or what device that the user uses?

Type your question above this line.


Sharing my thoughts here.
There’s a very import concept that’s missing in this course doc, which is background conversion rate, or background CTR, or overall CTR. Here background means the measurement of the global CTR for the whole data set.
Try to imagine such a scenario, where we have a training set of 1000 samples, 20 positive, and 980 negative. Here we can say background CTR of this dataset is 20/1000=2%. This represents the overall performance of the “truth” we’ve fed the model.
Such pre-existing distribution of the labels, kinda acts as a snapshot of the current user’s behaviors. And when user’s behavior changes, such background CTR will change.
On that note, as we are optimizing the model to get better prediction on the positive labels, and if such positive labels are sooo rare, the loss will tend to be very high in most cases, that’s too harsh for the model. That’s why we want to moderate the loss by a factor, that takes consideration of the overall distribution of the labels across the training set, here comes normalized cross entropy. The normalization here is the very moderation that I mentioned above.
As to your points:

  1. what is the relationship between the CTR and the user’s behavior changes?
    CTR, or more specifically, background CTR is a snapshot of overall behavior of users, while it’s hard to represent a drift of some cohort of users unless we split the dataset further.
  2. why the less sensitivity to CTR will help to adapt to behavior changing.
    By introducing losses like normalized cross entropy, we are taking the drift of overall distribution into consideration when calculating loss.

Hope that helps, and if there’s anything sounds weird above, happy to discuss! :slight_smile: