从给定的n个点选择了最远的k个点 [英] Chose the farthest k points from given n points

查看:176
本文介绍了从给定的n个点选择了最远的k个点的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的 N 的分维度的ð的对,我可以计算出所有成对距离,如果需要的是一个集合S。我需要在该组中选择的 K 的点,使得它们的成对距离的总和为最大的。在其他的,稍微数学的话,我想P1,...,PK S中,使得总和(I,J< K)DIST(PI,PJ)是最大的。

I have a set S of n points in dimension d for which I can calculate all pairwise distances if need be. I need to select k points in this set so that the sum of their pairwise distances is maximal. In other, slightly more mathematical words, I want p1, ..., pk in S such that sum(i,j < k) dist(pi, pj) is maximal.

我知道这个问题是关系到<一个href="http://stackoverflow.com/questions/1618398/given-a-set-of-points-how-do-i-find-the-two-points-that-are-farthest-from-each">this 之一(这基本上是和我一样,但对于k = 2),也许到<一个href="http://stackoverflow.com/questions/5492526/choose-the-closest-k-points-from-given-n-points">this 之一(与'最远的',而不是'最近')

I know this question is related to this one (which is basically the same as mine but for k=2) and maybe to this one (with 'farthest' instead of 'closest').

我也不太清楚这件事,但也许所有可能的解决方案已经在凸包所有的点?

I am not too sure about that, but maybe all the possible solutions have all their points in the convex hull ?

任何合理的近似/启发式的罚款。

Any reasonable approximation/heuristic is fine.

虚拟奖励点#1的解决方案,它适用于任何功能,给出了一个得分出四点(其中之一可以是平方距离的总和的平方根)。

Virtual bonus point #1 for a solution which works for any function that gives a score out of the four points (one of which could be the square root of the sum of the squared distances).

虚拟加分#2如果解决方案在Python容易实现+ numpy的/ SciPy的。

Virtual bonus point #2 if the solution is easily implemented in python + numpy/scipy.

推荐答案

您的问题似乎类似于加权最小顶点覆盖问题(这是NP完全问题)。感谢@Gareth里斯低于澄清,我是不正确的理解一个顶点覆盖的关系,你要寻找的组的意见。但是你仍然可以调查点覆盖问题,文学,因为你的问题可能是一起进行讨论,因为他们仍然共享一些功能。

Your problem seemed similar to the weighted minimum vertex cover problem (which is NP-complete). Thanks to @Gareth Rees for the comments below clarifying that I was incorrect in understanding a vertex cover's relationship to the set you're looking for. But you may still investigate the vertex cover problem and literature because your problem might be discussed alongside it, as they still do share some features.

如果你愿意,直径,而不是概括图重的工作,你可以使用你在你的问题联系在一起的最小直径设定的方法。如果您目前的距离测量被称为 D (您想从对方点最远的一个),那么就定义 D'= 1 / D 并解决与 D 的最小距离问题。

If you're willing to work with diameter instead of summed graph weight, you could use the approach for the minimal diameter set that you linked in your question. If your current distance measure is called d (the one for which you want the points furthest from each other) then just define d' = 1/d and solve the minimum distance problem with d'.

此外,还有可能是切割算法,好比说某种形式的图形之间的关系的标准化切和你所寻求的子集。如果你的距离度量作为节点之间的图形质量或姻亲,你也许可以修改现有图形切割目标函数,以配合您的目标函数(寻找K组节点有最大的总和的重量)。

There might also be a relationship between some form of graph cutting algorithm, like say normalized cut, and the subset you seek. If your distance measure is used as the graph weight or affinity between nodes, you might be able to modify an existing graph cutting objective function to match your objective function (looking for the group of k nodes that have maximum summed weight).

这似乎是一个组合地难的问题。你可能会考虑一些简单的像模拟退火。该提案功能可以只选择点是目前在 K -subset随机的,并带着点不在当前的 K -subset。

This seems like a combinatorially difficult problem. You might consider something simple like simulated annealing. The proposal function could just choose at point that's currently in the k-subset at random and replace it randomly with a point not currently in the k-subset.

您将需要一个良好的散热时间表温度来看,可能需要使用再加热成本的函数。但是,这种这是非常简单的程序。只要 N 是相当小的,然后你可以只是不断地随机选择 K -subsets以及对退火一个氏/ code> -subset具有非常大的总距离。

You would need a good cooling schedule for the temperature term and may need to use reheating as a function of cost. But this sort of this is really simple to program. As long as n is reasonably small, you can then just constantly randomly select k-subsets and anneal towards a k-subset with very large total distance.

这只会给你一个近似,但即使​​是确定性的方法可能会解决这个约。

This would only give you an approximation, but even deterministic methods probably will solve this approximately.

下面是在什么模拟退火code可能是第一次黑客攻击。 注意的是,我不是做这个保证。它可以是一个低效率的解决方案,如果计算的距离太硬或问题的实例变得太大。我使用一个固定的冷却速度非常幼稚的几何冷却,你可能还需要鼓捣爱好者的建议不仅仅是随机交换左右节点。

Below is a first hack at what the simulated annealing code might be. Note that I'm not making guarantees about this. It could be an inefficient solution if calculating distance is too hard or the problem instance size grows too large. I'm using very naive geometric cooling with a fixed cooling rate, and you may also want to tinker with a fancier proposal than just randomly swapping around nodes.

all_nodes = np.asarray(...) # Set of nodes
all_dists = np.asarray(...) # Pairwise distances

N = len(all_nodes)
k = 10 # Or however many you want.

def calculate_distance(node_subset, distances):
    # A function you write to determine sum of distances
    # among a particular subset of nodes.    

# Initial random subset of k elements
shuffle = np.random.shuffle(all_nodes) 
current_subset = shuffle[0:k]
current_outsiders = shuffle[k:]

# Simulated annealing parameters.
temp = 100.0
cooling_rate = 0.95
num_iters = 10000

# Simulated annealing loop.
for ii in range(num_iters):
    proposed_subset = current_subset.copy()
    proposed_outsiders =  current_outsiders.copy()

    index_to_swap = np.random.randint(k)
    outsider_to_swap = np.random.randint(N - k)

    tmp = current_subset[index_to_swap]
    proposed_subset[index_to_swap] = current_outsiders[outsider_to_swap]
    proposed_outsiders[outsider_to_swap] = tmp

    potential_change = np.exp((-1.0/temp)*
        calculate_distance(proposed_subset,all_dists)/
        calculate_distance(current_subset, all_dists)) 

    if potential_change > 1 or potential_change >= np.random.rand():
         current_subset = proposed_subset
         current_outsiders = proposed_outsiders

    temp = cooling_rate * temp

这篇关于从给定的n个点选择了最远的k个点的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆