查找以K元素是最接近海誓山盟子集 [英] Find subset with K elements that are closest to eachother

查看：103 发布时间：2015/11/30 14:55:03 python algorithm

本文介绍了查找以K元素是最接近海誓山盟子集的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

鉴于整数大小的数组的 N ，你怎么能有效地找到大小的一个子集的 K 与最接近彼此的元素？

让的接近程度为一个子集（X1，X2，X3，... XK）被定义为：

  2'; = N＆LT; = 10 ^ 5

2'= K＆LT; = N

约束：阵列可能包含重复，并且不保证进行排序

我的蛮力解决方案是大的N很慢，它不检查是否有超过1解决方法：

  N =输入（）
K =输入（）
断言2'; = N＆LT; = 10 ** 5
断言2'= K＆LT; = N
一个= []
因为我在的xrange（0，N）：
    a.append（输入（））
a.sort（）

最小=所有的sys.maxint
在startIndex = 0

对于i中的xrange（0，N-K + 1）：
    最后= I + K
    TMP = 0
    对于j中的xrange（我，最后一个）：
        对于L中的xrange（J + 1，最后一个）：
            TMP + = ABS（A [J] -a [L]）
            如果（TMP＆GT;最小）：
                打破

    如果（TMP＆LT;最小）：
        最小= TMP
        在startIndex =我#END指数=在startIndex + K？

例如：

  N = 7
K = 3
阵列= [10,100,300,200,1000,20,30]
结果= [10,20,30]

N = 10
K = 4
阵列= [1,2,3,4,10,20,30,40,100,200]
结果= [1,2,3,4]

解决方案

您当前的解决方案是 O（NK ^ 2）（假设 K＆GT ;日志N ）。随着一些分析，我相信你可以减少到 O（NK）。

最近的一组K规格将包括那些在排序列表中相邻的元素。本质上，首先对数组进行排序，所以随后的分析将假定的 K 号的每个序列进行排序，它可以简化双总和

假设数组排序，使得 X [J]＆GT; = X [I] 在 J＆GT;我，我们可以重写你的亲近度量值，以消除绝对值：

接下来，我们重写你的符号变成了双总和与简单的界限：

请注意，我们可以重写 X之间的[I] 和内距离 X [J] 作为第三总结：

在这里我使用 D [L] 来简化符号前进：

注意 D [L] 是列表中的每个相邻元素之间的距离。看看内部的两个求和的结构固定我：

  J = i + 1的D [I]
J = + 2 D [I] + D [I + 1]
J = i + 3中D [I] + D [I + 1] + D [I + 2]
...
J = K = 1 +（基）D [I] + D [I + 1] + D [I + 2] + ... + D [K-1]

注意内部的两个求和的三角形结构。这使我们能够改写内部的两个求和作为单个求和相邻术语的距离来：

 总：（KI）* D [I] +（KI-1）* D [I + 1] + ... + 2 * D [K-2] + 1 * D [K-1]

这减少的总和为：

现在我们可以看看这双求和的结构：

的 I = 1（K-1）* D [1] +（K-2）* D [2] +（K-3）* D [3] + .. + 2 * D [K-2] + D [K-1]
设为i = 2（K-2）* D [2] +（K-3）* D [3] + ... + 2 * D [K-2] + D [K-1]
I = 3（K-3）* D [3] + ... + 2 * D [K-2] + D [K-1]
...
I = K-2 2 * D [K-2] + D [K-1]
I = K-1 D [K-1]

此外，注意三角形图案。总和变为：

的 1 *（k-1）* D [1] + 2 *（K-2）* D [2] + 3 *（K-3）* D [3] + ...（K-2）* 2 * D [K-2]
  +（K-1）* 1 * D [K-1]

或者，写成单个求和：

的相邻的差异这种紧凑单个求和是基础更有效的算法：

排序数组，为了 O（N日志N）
在计算每个相邻的元素，为了 O（N）的差异
在迭代的差异每个 NK 序列，并计算上述款项，为了 O（NK）

请注意，第二和第三步可以合并，但与Python您的情况可能会有所不同。

在code：

 高清亲密（差异，K）：
  ACC = 0.0
  对（I，V）在历数（差异）：
    ACC + =第（i + 1）*（K-第（i + 1））* V
  回报ACC

高清最接近（A，K）：
  a.sort（）
  N = LEN（一）
  的diff = [A [1 + 1]  -  A [1]为在的xrange I（N-1）]

  min_ind = 0
  MIN_VAL =亲近（差异[0：K-1]，K）的

  对于IND中的xrange（1，N-K + 1）：
    CL =亲近（差异[IND：的ind + K-1]，K）的
    如果Cl  - 浓度MIN_VAL：
      min_ind = IND
      MIN_VAL = CL

  返回[min_ind：min_ind + K]

Given an array of integers size N, how can you efficiently find a subset of size K with elements that are closest to each other?

Let the closeness for a subset (x1,x2,x3,..xk) be defined as:

2 <= N <= 10^5

2 <= K <= N

constraints: Array may contain duplicates and is not guaranteed to be sorted.

My brute force solution is very slow for large N, and it doesn't check if there's more than 1 solution:

N = input()
K = input()
assert 2 <= N <= 10**5
assert 2 <= K <= N
a = []
for i in xrange(0, N):
    a.append(input())
a.sort()

minimum = sys.maxint
startindex = 0

for i in xrange(0,N-K+1):
    last = i + K
    tmp = 0
    for j in xrange(i, last):
        for l in xrange(j+1, last):
            tmp += abs(a[j]-a[l])
            if(tmp > minimum):
                break

    if(tmp < minimum):
        minimum = tmp
        startindex = i #end index = startindex + K?

Examples:

N = 7
K = 3
array = [10,100,300,200,1000,20,30]
result = [10,20,30]

N = 10
K = 4
array = [1,2,3,4,10,20,30,40,100,200]
result = [1,2,3,4]

解决方案

Your current solution is O(NK^2) (assuming K > log N). With some analysis, I believe you can reduce this to O(NK).

The closest set of size K will consist of elements that are adjacent in the sorted list. You essentially have to first sort the array, so the subsequent analysis will assume that each sequence of K numbers is sorted, which allows the double sum to be simplified.

Assuming that the array is sorted such that x[j] >= x[i] when j > i, we can rewrite your closeness metric to eliminate the absolute value:

Next we rewrite your notation into a double summation with simple bounds:

Notice that we can rewrite the inner distance between x[i] and x[j] as a third summation:

where I've used d[l] to simplify the notation going forward:

Notice that d[l] is the distance between each adjacent element in the list. Look at the structure of the inner two summations for a fixed i:

j=i+1         d[i]
j=i+2         d[i] + d[i+1]
j=i+3         d[i] + d[i+1] + d[i+2]
...
j=K=i+(K-i)   d[i] + d[i+1] + d[i+2] + ... + d[K-1]

Notice the triangular structure of the inner two summations. This allows us to rewrite the inner two summations as a single summation in terms of the distances of adjacent terms:

total: (K-i)*d[i] + (K-i-1)*d[i+1] + ... + 2*d[K-2] + 1*d[K-1]

which reduces the total sum to:

Now we can look at the structure of this double summation:

i=1     (K-1)*d[1] + (K-2)*d[2] + (K-3)*d[3] + ... + 2*d[K-2] + d[K-1]
i=2                  (K-2)*d[2] + (K-3)*d[3] + ... + 2*d[K-2] + d[K-1]
i=3                               (K-3)*d[3] + ... + 2*d[K-2] + d[K-1]
...
i=K-2                                                2*d[K-2] + d[K-1]
i=K-1                                                           d[K-1]

Again, notice the triangular pattern. The total sum then becomes:

1*(K-1)*d[1] + 2*(K-2)*d[2] + 3*(K-3)*d[3] + ... + (K-2)*2*d[K-2] 
  + (K-1)*1*d[K-1]

Or, written as a single summation:

This compact single summation of adjacent differences is the basis for a more efficient algorithm:

Sort the array, order O(N log N)
Compute the differences of each adjacent element, order O(N)
Iterate over each N-K sequence of differences and calculate the above sum, order O(NK)

Note that the second and third step could be combined, although with Python your mileage may vary.

The code:

def closeness(diff,K):
  acc = 0.0
  for (i,v) in enumerate(diff):
    acc += (i+1)*(K-(i+1))*v
  return acc

def closest(a,K):
  a.sort()
  N = len(a)
  diff = [ a[i+1] - a[i] for i in xrange(N-1) ]

  min_ind = 0
  min_val = closeness(diff[0:K-1],K)

  for ind in xrange(1,N-K+1):
    cl = closeness(diff[ind:ind+K-1],K)
    if cl < min_val:
      min_ind = ind
      min_val = cl

  return a[min_ind:min_ind+K]

这篇关于查找以K元素是最接近海誓山盟子集的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

查找以K元素是最接近海誓山盟子集 [英] Find subset with K elements that are closest to eachother

问题描述

相关文章

Python最新文章

热门教程

热门工具

登录关闭

查找以K元素是最接近海誓山盟子集 [英] Find subset with K elements that are closest to eachother

问题描述

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭