We are interested in approximating distance between points . For some sets , we write as all unordered pairs of points. Additionally, define
A pair decomposition of is a set of pairs such that
For every ,
For every ,
A pair is well-separated if . A well-separated pair decomposition (WSPD) is a set of pairs where each and is well-separated.
Quadtree Construction
Given a point set , construct a compressed quadtree with root . For some node , write as the corresponding hypercube. Define and as the set of points in .
For two nodes , we write .
To compute the -WSPD of , call .
If and , return
If , swap and
If , return
Return where are children of
Lemma 1
Info
For some pair computed by , we have , where is the parent node of in .
A moment’s thought shows that and . At some point, we call through a sequence of arguments . WLOG assume that and . Take to be the last index where . Note is a descendent of and the larger cell is always refined first, thus . By symmetry this argument holds for .
Lemma 2
Info
Let be a cell of a grid of with diameter . For , there are at cells of distance at most from
Every cell with distance at most from is contained in some hypercube with side length . This hypercube has volume , while has volume . Thus by packing we can fit at most cells in this hypercube, bounding the number of candidate cells.
Lemma 3
Info
For the WSPD generated by , for any pair and , , we have
Note that , since and .
Correctness
Performing structural induction on , it is clear that this will produce a pair of subsets which covers every pair in . Thus covers every pair of .
For some output pair , if both are singletons it is clear that and are well separated. Otherwise , so . Therefore and this is a valid pair decomposition.
For every output pair , by construction we have . Thus we have a -WSPD.
Complexity
The quadtree can be constructed in . The remainder of the algorithm is linear in output size, thus it suffices to bound the number of pairs returned.
For some pair generated by , suppose this was called by , where is the parent of . Charge the cost of computing to . For all such in , any pair was considered but not added, thus , so . By Lemma 1, , thus we case on equality.
Case 1. By Lemma 2, there are at most candidates for . Since has at most children, this charge occurs at most times.
Case 2. Note , thus by identical argument, there are candidates for . This node also has at most children, so we have at most charges.
Case 3. Consider the grid of cells about , and let be the cell containing . Note that , so . Similarly, there are candidates of . Since we only consider immediate children of , there are at most charges of this type.
Assuming is constant, we have charges per , and at most pairs generated, as has nodes. Additionally, the algorithm terminates in time.