带有距离功能的组的最大直径 [英] Biggest diameter of a set with a distance function

查看:110
本文介绍了带有距离功能的组的最大直径的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一组元素,它们之间的距离函数满足三角形不等式.

I have a set of elements with a distance function between elements satisfying the triangle inequality.

我想找到相距最大的一对元素.

I want to find the pair of elements separated by the greatest distance.

有什么已知的解决方案比尝试所有配对更好吗?

Is there any known solution better than trying all pairs?

推荐答案

如果测量从点a到点b,c和d的距离,您会发现| ab |. + | ac | < | ad |,那么您知道| bc |比| ad |短,并且不需要测量| bc |.因此,并非所有的线对都需要检查以找到最长的距离.

If you measure the distance from point a to points b, c and d, and you find that |ab| + |ac| < |ad|, then you know that |bc| is shorter than |ad|, and there's no need to measure |bc|. So not all pairs need to be checked to find the longest distance.

可能的算法是:
首先测量从点a到所有其他点的距离,找到距离a最远的点n,然后给出| ab | + | ax |的所有对b,x. < |一个|距离| ab | + | ax | (因为这是它们的最大可能距离).
对点b进行相同的操作,仅测量尚未设置的距离.检查是否找到了新的最大值,然后再次给出| bc | + | bx |的所有对c,x. <最大距离| bc | + | bx |.
继续对点c,d,...进行此操作

A possible algorithm would be:
Start by measuring the distance from point a to all other points, find the point n which is furthest away from a, and then give all pairs b,x for which |ab|+|ax| < |an| the distance |ab|+|ax| (because that is their maximum possible distance).
Do the same for point b, measuring only those distances which haven't yet been set. Check if you've found a new maximum, and then again give all pairs c,x for which |bc|+|bx| < MAX the distance |bc|+|bx|.
Go on doing this for points c, d, ...

在最佳情况下,仅经过N-1次测量(如果| ax |是距a的任何其他距离的两倍,则可以在一组N点中找到最长的距离).在最坏的情况下,您需要测量每一对(如果最短的距离超过最长的距离的一半,或者如果您跑分的顺序不走运).

In the best case scenario, you could find the longest distance in a set of N points after just N-1 measurements (if |ax| is twice as long as any other distance from a). In the worst case, you would need to measure every single pair (if the shortest distance is more than half of the longest distance, or if you are unlucky in the order in which you run through the points).

如果要将距离测量的数量减少到绝对最小值,并且对于每个未知距离x,y,请检查每个先前存储的值| ax | + | ay |,| bx | + | by |,| cx | + | cy | ...以查看它是否小于当前最大值,并因此可以用作| xy |的值,因此大大减少了测量次数.

If you want to reduce the number of distance measurements to the absolute minimum, and for every unknown distance x,y you check every previously stored value |ax|+|ay|, |bx|+|by|, |cx|+|cy| ... to see whether it's smaller than the current maximum and can thus be used as a value for |xy|, the number of measurements is reduced substantially.

在方形2D空间中的1000个随机点上运行此算法(通常需要499,500次测量),返回的最大距离为2,000至10,000次测量(或总测量值的0.4%至2%,平均约为1 %).

Running this algorithm on 1000 random points in a square 2D space, which would normally require 499,500 measurements, returns the maximum distance with between 2,000 and 10,000 measurements (or between 0.4% and 2% of the total, with an average around 1%).

这不一定意味着该算法在实践中比测量每个距离要快得多;这取决于将测量与避免测量所需的回路,增加和比较的组合相比要花费多少.

This doesn't necessarily mean that the algorithm is much faster in practice than measuring every distance; that depends on how expensive the measuring is compared to the combination of loops, additions and comparisons required to avoid the measurements.

正如@mcdowella指出的那样,这种方法的效率随着空间维数的增加而降低.点数也有很大的影响.下表显示了相对于总对数必须执行的测量次数.这些是来自在正方形"空间中随机分布的点的测试所得的平均值(即,所有维度上的坐标都在同一范围内).如您所见,此方法对于2D或3D空间中具有许多点的几何问题最有意义.但是,如果您的数据在某种程度上存在高度偏差,则结果可能会有所不同.

As @mcdowella pointed out, this method becomes less efficient as the number of dimensions of the space increases. The number of points also has a big impact. The table below shows the number of measurements that have to be carried out in relation to the total number of pairs. These are averages from a test with randomly distributed points in a "square" space (i.e. the coordinates in all dimensions are within the same range). As you can see, this method makes most sense for geometrical problems with many points in a 2D or 3D space. However, if your data is highly biased in some way, the results may be different.


       10 points (45 pairs)      100 points (4950 pairs)   1000 points (499500 pairs)
dim    measurem.   % of total    measurem.   % of total    measurem.   % of total

 1      16.6674      37.04         221.17       4.47        4877.97       0.98
 2      22.4645      49.92         346.77       7.01        5346.78       1.07
 3      27.5892      61.31         525.73      10.62        7437.16       1.49
 4      31.9398      70.98         731.83      14.78       12780.02       2.56
 5      35.3313      78.51         989.27      19.99       19457.84       3.90
 6      38.1420      84.76        1260.89      25.47       26360.16       5.28
 7      40.2296      89.40        1565.80      31.63       33221.32       6.65
 8      41.6864      92.64        1859.08      37.56       44073.42       8.82
 9      42.7149      94.92        2168.03      43.80       56374.36      11.29
10      43.4463      96.55        2490.69      50.32       73053.06      14.63
20      44.9789      99.95        4617.41      93.28      289978.20      58.05
30      44.9996      99.999       4936.68      99.73      460056.04      92.10
40                                4949.79      99.99      496893.10      99.48
50                                4949.99      99.9999    499285.80      99.96
60                                                        499499.60      99.9999

正如预期的那样,在更高的维度上测试的结果变得可预测,离群值之间只有百分之几的空间,而在2D中,某些测试用例需要比其他测试用例多30倍的测量.

As expected, the results of the tests become predictable at higher dimensions, with only a few percent between the outliers, while in 2D some test cases required 30 times more measurements than others.

这篇关于带有距离功能的组的最大直径的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆