educative.io

Why do customers are subscribed on drivers?

We can maintain a list of customers (subscribers) interested in knowing the location of a driver and whenever we have an update in DriverLocationHT for that driver, we can broadcast the current location of the driver to all subscribed customers.

What is the purpose to subscribe customers on nearby drivers?

Drivers can easily skip the current area, a customer doesn’t care about any particular driver, he just wants to call best/nearest one.

Can we simply broadcast all drivers in the area each time without any excessive subscriptions, so that we wouldn’t need a special case to introduce a new driver who entered the area?
A driver can leave the area also, why do we push a user to track the driver?

2 Likes

Customers subscribe to a driver only if they are interested in knowing the driver’s location. Currently, this is only required when a customer opens the Uber’s app. After opening the app, the customer sees the nearby drivers. These nearby drivers are the ones that the customer wants to subscribe to, in other words, the customer wants to have regular updates about these drivers. This can be done though a push or pull model. We can keep a subscription list of each driver and notify all customers who are interested in knowing their location (push model) or customer regularly query the server to always search for nearby driver within an area (pull). The pull model can be implemented with or without the publisher/subscriber mode.

1 Like

The reason for the confusion is because we start off by trying to find a solution of how to send updates of the drivers to the riders. Over here you talk about using a pub/sub method and thats cool and pretty efficient too. However, then you talk about publishing the data for a new driver in the area and the solution to that is querying the QuadTree every 5 seconds. Now if you are already going to be doing that, whats the need of the pub/sub model? Because you will get the updated location for both old and new drivers.

Secondly, if the QuadTree is getting update every 15 secs, why should the customer poll for this data every 5secs. Isnt he going to get the same location?

2 Likes

The customer would be polling the DriverLocation HashTable after getting the initial list of nearby drivers from the QuadTree.

I think it would go something like this:

  1. Customer opens Uber app.
  2. System queries the Quad Tree and provides X drivers that are within the Customer’s current viewport.
  3. Every 5 seconds, Customer polls service for updated driver locations.
  4. System gets updated driver locations from DriverLocationHashTable.
  5. If customer moves map, system has to query the QuadTree again to get a list of nearby drivers.
3 Likes

@Noe_Brito , then how will new drivers will be added as the customer is polling to get updateDriverCordinates once it get the list of drivers, now once new driver is in area, what to do?

2 Likes

I guess it should again add this customer to these new entrying driver’s subcription list.
And here comes a new question, what to do with the driver leave the rider’s area?

I can think of 2 solutions.

  1. Every time the rider pulls the nearby driver list, the backend will do 2 things,
    a. add this rider to the diver’s consumer’s list.
    b. send back this list to the client backend
    Then, the client’s backend can filter the driver is not in the latest list (because we have not delete it from the driver’s list). To clean up the stale data in the driver’s consumer’s list, we need to set a time-to-live parameter for every rider in the list, e.g. after 10min this rider will delete from the list(we can maintain this with a set data structure in case we add duplicate rider in to the same consumer’s list). By this way, we achieve eventually consistence. The cons are we use extra resources in client’s backend and also the publish useless information to rider who does not need it.

  2. Every time the rider pulls the nearby driver list, the backend will do 2 things,
    a. add this rider to the diver’s consumer’s list.
    b. we also maintain a reverse mapping from rider to driver info, let’s call it rider2driver mapping

Every time the rider pulls the nearby driver info and call this new rider2driver mapping, we first delete every this rider from all the driver’s consumer list using the old rider2driver mapping, and then we add the rider to the driver’s consumer list using the new rider2driver mapping.

pros: here is we don’t have stale data any more and will not send useless driver update to rider does not need it.
cons: this will limit our through put, we have more write/update operation on the driver’s consumer list and will have exclusive lock on it(writing lock).

I think performace is more important, so I personally would choose the first solution.

I also find this Tinder series is informative

There, I think they have a relative fixed geo sharding/segment and I guess they assume the density/workload is relative stable even though people moving dynamically across geo segment.

Then in such setup, we only need to maintain the driver’s document in the right geo segment in near real time. Then user can just pull all the driver infomation in its geo segment or nearby geo segments.

The key difference with the approach in the course is:
In the course, we are maintaining driver-rider relationship as the geo segment will change in the quardtree.
Tinder is maintaining a user-geosegment relationship, as the geo segment itself is fixed but the documents in it may move across segments.

in first option, if driver is pulling from server by polling again and again, then how will you send the coordinate update from drivers to customers?
A log in app,
send request to server
gets list of nearby drivers?

Now you again send request,you may get different list of drivers from server?Also server needs to again do quad tree search and find nearby drivers ,isnt it too expensive?
How will you keep track on app, the movement of drivers so that customers can track it live?

I cannot think of a way that client does not quey the quadtree and add new available driver to user’s app.

Unless we also keep track of client who looking for a car in the quadtree, I cannot think of a way that a client does not quey the quadtree and add new available drivers to user’s app but the backend can pubish the new driver info to user in this area.

What do you mean driver is pulling from server by polling again and again?
drivers update coordinate to customer as the way described in the course. Driver update its information to quadtree/DriverLocationHT, and user query the tree and subcribe to driver’s infomation.

You said above,so are you pulling information multiple times,like http polling ?Let us say first request gave us 5 drivers and customer is subscribed there.

  1. Now one request you will send to get 5 drivers updated coordinates
  2. Second request for getting new drivers?

These will be two different requests, right?

then in second request we may get different drivers than first,lots of confusion:(

yes, we will get different drivers in different requests. And that is why I said there will be step b in both approches, which is used to deduplicate/block stale driver infomation.

Adding my two cents.
“Get me drivers nearby me” is not a straightforward problem.
The issue comes from update frequency which is very high in such a case. ( every 3 seconds for 1 driver ).
We had similar problem in our company and we ended up serving this type of query from very familiar REDIS cache.
please read: https://redislabs.com/redis-best-practices/indexing-patterns/geospatial/

databases like Postgres support Geospatial queries but again problem comes from the updates.

if we are going ahead with QuadTree + hash table for latest location kind of approach, then in order to avoid so frequent updates to QuadTree, the solution in the given text seems fine to me ( which is periodically update QuadTree by a separate job ).
But then please don’t do any pub/sub. That’s pointless and confusing thing in the text.
if updates are propagated to QuadTree, “Get me drivers nearby me” query will work fine but with an error of 15 seconds which can be tolerable.

If we’re going with the hash table, I would rather have the following approach:
Key: grid ID
Value: a list of all customers currently subscribed to this grid.
A customer will subscribe to multiple grids (say their current grid + 8 neighbors). As the driver pings their location every three seconds, we find their current grid and send updates to all subscribers of that grid. If they move from one grid to another, we don’t make any changes to the HT; the old grid just stops receiving updates. Only customers modify the HT by initiating/canceling their requests. The quad tree update happens after 15 seconds as before.