educative.io

Search workflow in many index servers

When quadtree in one server, “We will first find the node that contains the user’s location. If that node has enough desired places, we can return them to the user. If not, we will keep expanding to the neighboring nodes (either through the parent pointers or doubly linked list) until either we find the required number of places or exhaust our search based on the maximum radius.”

According to sharding based on locationID, While building our QuadTree, we will iterate through all the places and calculate the hash of each LocationID to find a server where it would be stored.

But when we shard quadtree based on locationID into many server. What’s the detailed search workflow? Nodes are distributed in different servers and we don’t even know how and where to find root node, parent node or child node since servers only have locationID. How can we do the search? @Design_Gurus

8 Likes

I agreed! This strategy of sharding based on LocationID TOTALLY defeats the purpose!!!

The whole point of having a QuadTree is to reduce the effort of searching neighboring nodes, because all places nearby are in the neighboring nodes. But after sharding on LocationID, we have to “iterate through all the places and calculate the hash of each LocationID and find its neighbors…” so we might as well not use the QuadTree???

2 Likes

@Ang_Li, @Franklin,

Yes, sharding based on LocationID will distribute the locations randomly, and to search the neighbors we have to query ALL servers as mentioned in the chapter:

To find places near a location, we have to query all servers, and each server will return a set of nearby places. A centralized server will aggregate these results to return them to the user.

Since, using this sharding scheme, we will get a random distribution of location, we are not going to get any of the issues mentioned in the ‘sharding based on regions’, like no hot regions and no region overflowing. However, then sharding on LocationID has its pros/cons, as we have to query all servers/combine results, which will have higher latencies. Practically partitioning by LocationID might work well, we have to measure. One thing for sure, partitioning by region will not work in the long run.

2 Likes

@Design_Gurus This isn’t really answering the question that you’ve been asked.
I am also confused.

If our tree resides on one server, it is easy to add a new Place, but if the QuadTree is distributed among different servers, first we need to find the grid/server of the new Place and then add it there (discussed in the next section).

If every node of the tree is randomly distributed between servers, the integrity is broken. Unless you store the whole tree structure on every server, or at least enough of the structure to get to every leaf on that server. But then how do you find the leafs of neighboring squares?

2 Likes

@Design_Gurus important clarification is still waiting for answer

2 Likes
1 Like

My understanding is that sharding by LocationID will distribute the locations across multiple servers. Each server then builds its own QuadTree for its subset of Location IDs. To find the locations near your current location, you then query each server with your grid coordinate, and it looks up the locations in its QuadTree. The locations from each server are then combined and ranked, then returned to the client.

Leaf traversals are done by each individual server in its own QuadTree. Traversals are not done amongst servers. Only the results from each server will be combined.

4 Likes

thanks for the explanation, I’ve been breaking my head on this for the last two hours.
They should have mentioned that each server creates a sparsely populated quadtree with the places that hash to it. Each server is then asked “what places do you have for this grid coordinate”. they either return a handful of places or nothing. These results are then aggregated and given to the user.

But the problem I see with this is that we had created the original quadtree such that each leaf node had at most 500 locations. Now with each server having only a subset of the total locations will its grids match the original quadtree?? perhaps not.

@kiliman
Let me add to this explanation … imagine the Quad Tree in one server

After Sharding I believe it will look this :

so that is the reason we need to hit all server since location from particular grid might be spreaded to different server now.
I still believe this approach is not good… since we are increasing lot of computation and wasting lots of space on each server… I still believe making QuadTree on region is much better … for example west coast of usa will be one region and same way east cost … so that now based on use location we can find the region and search in that Quad Tree…

5 Likes

My understanding is that if we have for example 3 quad tree servers and we are looking for nearby places for x, y (Lat/Long user location), the system searchs in the all three servers. In each server it will find the same GridID and same neighbouring GridIDs, and collects the locationIDs gathered from all those found Grids in all those 3 servers and aggregate them as a final result.

Can you answer the question that is asked or make the paragraph more clear Please?