找出彼此最远的k个子集 [英] Find a subset of k most distant point each other

查看:69
本文介绍了找出彼此最远的k个子集的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一组N个点(特别是该点是二进制字符串),对于每个点,我都有一个离散量度(汉明距离),使得给定两个点i和j,Dij是两个点之间的距离.第i个和第j个点.我想找到k个元素的子集(当然,k <N),以使这k个点之间的距离尽可能最大.换句话说,我想要的是找到一种边界点",该边界点"覆盖点空间中的最大面积.如果k = 2,那么答案是微不足道的,因为我可以尝试在距离矩阵中搜索两个最远的元素,而这是两个点,但是当k> 2时,我该如何概括这个问题呢?有什么建议吗?这是NP难题吗?感谢您的回答

I have a set of N points (in particular this point are binary string) and for each of them I have a discrete metric (the Hamming distance) such that given two points, i and j, Dij is the distance between the i-th and the j-th point. I want to find a subset of k elements (with k < N of course) such that the distance between this k points is the maximum as possibile. In other words what I want is to find a sort of "border points" that cover the maximum area in the space of the points. If k = 2 the answer is trivial because I can try to search the two most distant element in the matrix of distances and these are the two points, but how I can generalize this question when k>2? Any suggest? It's a NP-hard problem? Thanks for the answer

推荐答案

一种概括是找到k个点,以使这k个点中的任何两个之间的最小距离尽可能大".

One generalisation would be "find k points such that the minimum distance between any two of these k points is as large as possible".

不幸的是,我认为这很难,因为我认为如果您能有效地做到这一点,就可以有效地找到集团.假设有人给您一个距离矩阵,并要求您找到一个k形斜面.创建另一个矩阵,其中条目1的原始矩阵具有无穷大,而条目1000000的原始矩阵具有任何有限的距离.现在,新矩阵中的一组k点的位置(其中该组中任意两个点之间的最小距离为1000000)对应于原始矩阵中的一组k点,它们彼此相连-一个集团.

Unfortunately, I think this is hard, because I think if you could do this efficiently you could find cliques efficiently. Suppose somebody gives you a matrix of distances and asks you to find a k-clique. Create another matrix with entries 1 where the original matrix had infinity, and entries 1000000 where the original matrix had any finite distance. Now a set of k points in the new matrix where the minimum distance between any two points in that set is 1000000 corresponds to a set of k points in the original matrix which were all connected to each other - a clique.

这种构造没有考虑到点对应于位向量并且它们之间的距离是汉明距离的事实,但是我认为可以扩展以应对这种情况.为了说明能够解决原始问题的程序可以用来查找团伙,我需要说明,给定一个邻接矩阵,我可以为每个点构造一个位向量,以便图中的点对成对,依此类推在邻接矩阵中具有1,彼此之间的距离大约为A,图中未连接的成对点彼此之间的距离为B,其中A> B.请注意,A可能与B非常接近.,则三角形不等式将迫使情况如此.一旦证明了这一点,k都指向彼此之间的距离A(因此,距离A最小,并且距离的总和为k(k-1)A/2)将对应于集团,因此程序找到了点会发现集团.

This construction does not take account of the fact that the points correspond to bit-vectors and the distance between them is the Hamming distance, but I think it can be extended to cope with this. To show that a program capable of solving the original problem can be used to find cliques I need to show that, given an adjacency matrix, I can construct a bit-vector for each point so that pairs of points connected in the graph, and so with 1 in the adjacency matrix, are at distance roughly A from each other, and pairs of points not connected in the graph are at distance B from each other, where A > B. Note that A could be quite close to B. In fact, the triangle inequality will force this to be the case. Once I have shown this, k points all at distance A from each other (and so with minimum distance A, and a sum of distances of k(k-1)A/2) will correspond to a clique, so a program finding such points will find cliques.

要做到这一点,我将使用长度为kn(n-1)/2的位向量,其中k将随n增长,因此位向量的长度可能多达O(n ^ 3).我可以避免这样做,因为这仍然只是n中的多项式.我将每个位向量划分为长度为k的n(n-1)/2个字段,其中每个字段负责表示两点之间的连接或不连接.我声称存在一组长度为k的位向量,因此,这k个长位向量之间的所有距离都大致相同,不同之处在于它们中的两个比另一个更靠近.我还声称,存在一组长度为k的位向量,因此它们之间的所有距离都大致相同,只是它们中的两个比其他的距离更远.通过在这两个不同的集合之间进行选择,并通过将更近或更远的一对分配给拥有位向量中n(n-1)/2个字段的当前位字段的两个点,我可以创建一组向量具有所需的距离模式.

To do this I will use bit-vectors of length kn(n-1)/2, where k will grow with n, so the length of the bit-vectors could be as much as O(n^3). I can get away with this because this is still only polynomial in n. I will divide each bit-vector into n(n-1)/2 fields each of length k, where each field is responsible for representing the connection or lack of connection between two points. I claim that there is a set of bit-vectors of length k so that all of the distances between these k-long bit-vectors are roughly the same, except that two of them are closer together than the others. I also claim that there is a set of bit-vectors of length k so that all of the distances between them are roughly the same, except that two of them are further apart than the others. By choosing between these two different sets, and by allocating the nearer or further pair to the two points owning the current bit-field of the n(n-1)/2 fields within the bit-vector I can create a set of bit-vectors with the required pattern of distances.

我认为这些存在是因为我认为有一种结构很可能创造出这种模式.创建长度为k的n个随机位向量.任何两个这样的位向量的预期汉明距离为k/2,方差为k/4,因此标准偏差为sqrt(k)/2.对于大k,我们期望不同的距离合理地相似.要在此集合中创建两个非常靠近的点,请复制一个点.要创建两个相距很远的点,请使一个不为另一个(其中一个为0,另一个为1,反之亦然).

I think these exist because I think there is a construction that creates such patterns with high probability. Create n random bit-vectors of length k. Any two such bit-vectors have an expected Hamming distance of k/2 with a variance of k/4 so a standard deviation of sqrt(k)/2. For large k we expect the different distances to be reasonably similar. To create within this set two points that are very close together, make one a copy of the other. To create two points that are very far apart, make one the not of the other (0s in one where the other has 1s and vice versa).

给出任意两个点,它们相互之间的预期距离将为(n(n-1)/2-1)k/2 + k(如果它们应该相距较远)和(n(n-1)/2 -1)k/2(如果应该将它们靠得很近),我声称没有证据表明,通过使k足够大,期望的差将胜过随机变异性,并且我得到的距离将非常接近A且相当我需要多少B.

Given any two points their expected distance from each other will be (n(n-1)/2 - 1)k/2 + k (if they are supposed to be far apart) and (n(n-1)/2 -1)k/2 (if they are supposed to be close together) and I claim without proof that by making k large enough the expected difference will triumph over the random variability and I will get distances that are pretty much A and pretty much B as I require.

这篇关于找出彼此最远的k个子集的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆