educative.io

Educative

Partitioning QuadTree by LocationID

Although partitioning DB by LocationID makes perfect sense I am not sure how we can partition QuadTree by locationID. Since Tree has logical flow between parent and child? it would be nice if some can elaborate how this will work?

2 Likes

I have the same confusion about this. Did you figure it out?

@Design_Gurus I have the same doubt. For example if user try to find all the location near by locationID 123 then you can find the gird of that location 123 but how will you find the rest of the nearest location of 123 since location has been distributed by hashing location id and they might be scattered to other servers. Then how the flow will look like? Please help

1 Like

My understanding is each server will have its own quadtree server for a subset of the lcoation Ids. So there is no parent - child linking across servers. So there will be multiple quadtrees and same grid could be present in multiple quadtrees. To find list of locations corresponding to a desired location, we need to search each quadtree and several of them could return a list of locationIds that are in proximity to the desired location. Those all have to be aggregated and returned finally.

6 Likes

@Design_Gurus, I had the same doubt. IMO a tree is only useful as long as we are able to search it. Now as far as I understood it, sharding based on the locationId will put the location on a random server based on the hash function.

Let’s take an example. My location is let’s say called “X”. I have 4 nearby places of interest. We will call them “A”, “B”, “C” & “D”.

Now with a QuadTree on a single server, these locations will be on a single node, or at max on 2 adjacent nodes (a possible example of this is -> 458 extra locations + “A” + “B” in one group and “C” and “D” with another 458unique locations in second group). Here one sub - doubt, is a node = server or a data structure?

Till now this is still manageable. Now, consider that “A”, “B”, “C”, and “D” got mapped to 4 different nodes by let’s say a crazy hash function. How are we supposed to implement the parent child relationship here? Even if we try the doubly linked list it still is a lot of a pain to implement.

And to make matters worse, if in such an example, there are 100 neighboring places, they might get stored in 20 different servers (just a random number). Things would get even more out of hand.

Maybe the approach specified by @AmSh would be a workable thing, but still, can you please elaborate a bit more on your locationId based solution?

We have to query all the servers, as the grid will be distributed to multiple servers, as mentioned in the chapter:

To find places near a location, we have to query all servers and each server will return a set of nearby places. A centralized server will aggregate these results to return them to the user.

This is exactly the approach as described by @ABD. Each location will be placed on a random server, determined by the hash function. So each server will hold a tree. Practically we should not require many servers. The whole quadtree can easily fit on 2-3 servers.

3 Likes

So by

You mean that each server will have that root node which covers the entire world? And then whatever location ids got hashed to it are going in the child nodes?

And when we query each server, it’ll do a sort of DFS on each of these QuadTrees and then an aggregator server will merge these results.

This makes the aggregation operation much shorter so that’s great.

Thanks for your explanation and help!!

Yes. Each server will have the root node and grids under it. For searching each server will perform DFS as you have mentioned.

2 Likes

each server will own some part of quad tree. Means each server will have some set of location and it will store this locations in its quad tree. And then we combine results from all servers to get overall result

You said the whole quadtree can easily fit on 2-3 servers. What do you mean? You mean the quadtree of the entire world?

Thanks for the explanation @Design_Gurus and @ABD! This cleared a lot of questions I had on Distributed QUAD tree.

I think there is no sense of querying all the servers as @ABD and @Design_Gurus suggested. Looks like you miss the main purpose of sharding - split the load, not multiply it.

Healthy man’s partiotioning: 3000 queries have come at the same time, they have been split by 10 sharded servers, each server has received ~ 300 queries.

Partiotioning of the smoker: 3000 queries have come at the same time, each query has been sent to each of the servers, each server has received 3000 queries, each server has got overwhelmed.