educative.io

Educative

Log Sampling every 1000th term

As the new queries come in we can log them and also track their frequencies. Either we can log every query or do sampling and log every 1000th query. For example, if we don’t want to show a term which is searched for less than 1000 times, it’s safe to log every 1000th searched term.

Unclear about what “safe” means in this context. If you are log sampling every 1000th query, you can still miss terms that have been searched over 1000 times.

Example:
Queries 1-999: “grokking”
Query 1000: “systems”
Queries 1001-1999: “grokking”
Query 2000: “systems”

You can add jitter so that you are querying a random term between 1-1000 each time to reduce the risk of this happening, but that isn’t called out so in the current design there would be risk of missing a term that has been searched over 1000 times.