选择最具周边点的最有效的方法 [英] Most efficient way to select point with the most surrounding points

查看:162
本文介绍了选择最具周边点的最有效的方法的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

注意:问题底部有一个主要的修改 - 请检查

问题

h3>



假设我有一组要点:



我想在半径(即一个圆圈)或(即一个正方形)的二维点。我将其称为最密集的点功能。



对于这个问题中的图表,我将周围的区域表示为圆圈。在上图中,中间点的周围区域显示为绿色。该中间点具有半径内的所有点的最多周边点,并且将由

我所尝试过的



解决这个问题的一个可行方法是使用范围搜索解决方案;



然后每百万点()需要执行范围搜索。最糟糕的时间,其中 是点在范围内返回,对于以下点树类型为真:




  • 两维的kd-trees(实际上它们稍微差一些, at ),

  • 2d范围树,
  • 四叉树,其最坏情况时间为



<因此,对于半径范围内的点,它给出了。这产生了超过一万亿次的操作!



任何想法都可以通过更高效,更准确的方式实现,这样我就可以找到一个群体最具周围点的点在合理的时间内(最好是

编辑



原来上面的方法是正确的!我只需要帮助实施它。



<半>解决方案



如果我使用2d范围的树:


  • 范围报告查询花费,返回积分,

  • 对于具有分数级联(也称为分层范围树)的范围树的复杂性为,

  • img src =https://chart.googleapis.com/chart?cht=tx&chl=O%28log%28n%29%7D%2Bk%29alt =O(log(n)+ k)>>

  • 此外,如果我执行范围计数查询(即,我不报告每个点),那么它会花费。



我会在每个点上执行此操作 - 生成 $ googleapis.com/chart?cht=tx&chl=O%28n+log%28n%29%29alt =O(n log(n))>复杂度我想要的!

问题



然而,我无法弄清楚如何编写计数代码查询二维分层范围树。

我找到了。这允许一维范围查询而不是二维查询。 (这是更高效,更简单,并在正方形邻域的情况下,不需要分数级联):


  1. 按Y坐标排序数组S。

  2. 前进3个指向数组S的指针:一个(C)用于当前检查的(中心)点;另一个,距离> C小于C的最近点A(稍前一点);和最后一个B(稍微后面),​​用于距离最远的点R

  3. 将A指向的点插入到订单统计树(按坐标X排序)并从该树中删除B指向的点。使用此树从C找到距左边/右边的距离为R的点,并使用这些点在树中的位置差异来获得C周围的方形区域中的点数。使用 上一步选择最包围点的结果。

如果旋转点(或仅交换XY坐标),以便占用区域的宽度不大于其高度。此外,如果树中的元素太多,以至于它不适合CPU缓存(这不太可能只有100万个点),您还可以将点分割为垂直切片(具有R大小的重叠)和分割处理切片。这个算法(优化与否)具有时间复杂度O(n log n)。

对于圆形邻域(如果R不是太大并且点均匀分布),你可以带有几个矩形的近似圆:



< img src =https://i.stack.imgur.com/Gpumw.pngalt =approximation circle>

在这种情况下,步骤算法2应该使用更多的指针来允许从几棵树插入/移除。在步骤3中,您应该在适当距离(<= R)的点附近进行线性搜索,以区分圆圈内的点与其外的点。 其他方式处理圆形的邻域是用相等高度的矩形近似圆形(但这里的圆应该被分割成更多的部分)。这导致更简单的算法(使用排序数组而不是顺序统计树):


  1. 将点所占据的区域切割为水平切片,按Y排序切片,然后按X排列切片内的点。

  2. 对于每个切片中的每个点,假定它是中心点并执行步骤3。 b $ b
  3. 对于每个附近的切片,使用二分查找来找到欧几里德距离接近于R的点,然后使用线性搜索来从外点中指示内点。停止线性搜索,其中切片完全位于圆内,并通过数组中位置的差异对剩余点进行计数。
  4. 使用上一步的结果选择最包围点。
  5. li>

该算法允许前面提到的优化以及分数级联。


N.B: there's a major edit at the bottom of the question - check it out

Question

Say I have a set of points:

I want to find the point with the most points surrounding it, within radius (ie a circle) or within (ie a square) of the point for 2 dimensions. I'll refer to it as the densest point function.

For the diagrams in this question, I'll represent the surrounding region as circles. In the image above, the middle point's surrounding region is shown in green. This middle point has the most surrounding points of all the points within radius and would be returned by the densest point function.

What I've tried

A viable way to solve this problem would be to use a range searching solution; this answer explains further and that it has " worst-case time". Using this, I could get the number of points surrounding each point and choose the point with largest surrounding point count.

However, if the points were extremely densely packed (in the order of a million), as such:

then each of these million points () would need to have a range search performed. The worst-case time , where is the number of points returned in the range, is true for the following point tree types:

  • kd-trees of two dimensions (which are actually slightly worse, at ),
  • 2d-range trees,
  • Quadtrees, which have a worst-case time of

So, for a group of points within radius of all points within the group, it gives complexity of for each point. This yields over a trillion operations!

Any ideas on a more efficient, precise way of achieving this, so that I could find the point with the most surrounding points for a group of points, and in a reasonable time (preferably or less)?

EDIT

Turns out that the method above is correct! I just need help implementing it.

(Semi-)Solution

If I use a 2d-range tree:

  • A range reporting query costs , for returned points,
  • For a range tree with fractional cascading (also known as layered range trees) the complexity is ,
  • For 2 dimensions, that is ,
  • Furthermore, if I perform a range counting query (i.e., I do not report each point), then it costs .

I'd perform this on every point - yielding the complexity I desired!

Problem

However, I cannot figure out how to write the code for a counting query for a 2d layered range tree.

I've found a great resource (from page 113 onwards) about range trees, including 2d-range tree psuedocode. But I can't figure out how to introduce fractional cascading, nor how to correctly implement the counting query so that it is of O(log n) complexity.

I've also found two range tree implementations here and here in Java, and one in C++ here, although I'm not sure this uses fractional cascading as it states above the countInRange method that

It returns the number of such points in worst case * O(log(n)^d) time. It can also return the points that are in the rectangle in worst case * O(log(n)^d + k) time where k is the number of points that lie in the rectangle.

which suggests to me it does not apply fractional cascading.

Refined question

To answer the question above therefore, all I need to know is if there are any libraries with 2d-range trees with fractional cascading that have a range counting query of complexity so I don't go reinventing any wheels, or can you help me to write/modify the resources above to perform a query of that complexity?

Also not complaining if you can provide me with any other methods to achieve a range counting query of 2d points in in any other way!

解决方案

I suggest using plane sweep algorithm. This allows one-dimensional range queries instead of 2-d queries. (Which is more efficient, simpler, and in case of square neighborhood does not require fractional cascading):

  1. Sort points by Y-coordinate to array S.
  2. Advance 3 pointers to array S: one (C) for currently inspected (center) point; other one, A (a little bit ahead) for nearest point at distance > R below C; and the last one, B (a little bit behind) for farthest point at distance < R above it.
  3. Insert points pointed by A to Order statistic tree (ordered by coordinate X) and remove points pointed by B from this tree. Use this tree to find points at distance R to the left/right from C and use difference of these points' positions in the tree to get number of points in square area around C.
  4. Use results of previous step to select "most surrounded" point.

This algorithm could be optimized if you rotate points (or just exchange X-Y coordinates) so that width of the occupied area is not larger than its height. Also you could cut points into vertical slices (with R-sized overlap) and process slices separately - if there are too many elements in the tree so that it does not fit in CPU cache (which is unlikely for only 1 million points). This algorithm (optimized or not) has time complexity O(n log n).

For circular neighborhood (if R is not too large and points are evenly distributed) you could approximate circle with several rectangles:

In this case step 2 of the algorithm should use more pointers to allow insertion/removal to/from several trees. And on step 3 you should do a linear search near points at proper distance (<=R) to distinguish points inside the circle from the points outside it.

Other way to deal with circular neighborhood is to approximate circle with rectangles of equal height (but here circle should be split into more pieces). This results in much simpler algorithm (where sorted arrays are used instead of order statistic trees):

  1. Cut area occupied by points into horizontal slices, sort slices by Y, then sort points inside slices by X.
  2. For each point in each slice, assume it to be a "center" point and do step 3.
  3. For each nearby slice use binary search to find points with Euclidean distance close to R, then use linear search to tell "inside" points from "outside" ones. Stop linear search where the slice is completely inside the circle, and count remaining points by difference of positions in the array.
  4. Use results of previous step to select "most surrounded" point.

This algorithm allows optimizations mentioned earlier as well as fractional cascading.

这篇关于选择最具周边点的最有效的方法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆