educative.io

Why tweet_id & user_id are sparse features?

I googled about sparse features and they are defined as features with a lot of missing or null values. Each user should be assigned an user_id and same for Tweet. So my question is why they are sparse features?

Here user_id and tweet_id are defined as those users/tweets with high engagement rates (you can see this definition in the feature engineering lesson of this section). In the case of users, these can be celebrities or influencers. That is why these features are sparse.