查找以K元素是最接近海誓山盟子集 [英] Find subset with K elements that are closest to eachother

查看:103
本文介绍了查找以K元素是最接近海誓山盟子集的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

鉴于整数大小的数组的 N ,你怎么能有效地找到大小的一个子集的 K 与最接近彼此的元素?

让的接近程度为一个子集(X1,X2,X3,... XK)被定义为:

  2'; = N< = 10 ^ 5

2'= K< = N
 

约束:阵列可能包含重复,并且不保证进行排序

我的蛮力解决方案是大的N很慢,它不检查是否有超过1解决方法:

  N =输入()
K =输入()
断言2'; = N< = 10 ** 5
断言2'= K< = N
一个= []
因为我在的xrange(0,N):
    a.append(输入())
a.sort()

最小=所有的sys.maxint
在startIndex = 0

对于i中的xrange(0,N-K + 1):
    最后= I + K
    TMP = 0
    对于j中的xrange(我,最后一个):
        对于L中的xrange(J + 1,最后一个):
            TMP + = ABS(A [J] -a [L])
            如果(TMP>最小):
                打破

    如果(TMP<最小):
        最小= TMP
        在startIndex =我#END指数=在startIndex + K?
 

例如:

  N = 7
K = 3
阵列= [10,100,300,200,1000,20,30]
结果= [10,20,30]

N = 10
K = 4
阵列= [1,2,3,4,10,20,30,40,100,200]
结果= [1,2,3,4]
 

解决方案

您当前的解决方案是 O(NK ^ 2)(假设 K&GT ;日志N )。随着一些分析,我相信你可以减少到 O(NK)

最近的一组K规格将包括那些在排序列表中相邻的元素。本质上,首先对数组进行排序,所以随后的分析将假定的 K 号的每个序列进行排序,它可以简化双总和

假设数组排序,使得 X [J]> = X [I] J>我,我们可以重写你的亲近度量值,以消除绝对值:

接下来,我们重写你的符号变成了双总和与简单的界限:

请注意,我们可以重写 X之间的[I] 和内距离 X [J] 作为第三总结:

在这里我使用 D [L] 来简化符号前进:

注意 D [L] 是列表中的每个相邻元素之间的距离。看看内部的两个求和的结构固定

  J = i + 1的D [I]
J = + 2 D [I] + D [I + 1]
J = i + 3中D [I] + D [I + 1] + D [I + 2]
...
J = K = 1 +(基)D [I] + D [I + 1] + D [I + 2] + ... + D [K-1]
 

注意内部的两个求和的三角形结构。这使我们能够改写内部的两个求和作为单个求和相邻术语的距离来:

 总:(KI)* D [I] +(KI-1)* D [I + 1] + ... + 2 * D [K-2] + 1 * D [K-1]
 

这减少的总和为:

现在我们可以看看这双求和的结构:

 I = 1(K-1)* D [1] +(K-2)* D [2] +(K-3)* D [3] + .. + 2 * D [K-2] + D [K-1]
设为i = 2(K-2)* D [2] +(K-3)* D [3] + ... + 2 * D [K-2] + D [K-1]
I = 3(K-3)* D [3] + ... + 2 * D [K-2] + D [K-1]
...
I = K-2 2 * D [K-2] + D [K-1]
I = K-1 D [K-1]
 

此外,注意三角形图案。总和变为:

 1 *(k-1)* D [1] + 2 *(K-2)* D [2] + 3 *(K-3)* D [3] + ...(K-2)* 2 * D [K-2]
  +(K-1)* 1 * D [K-1]
 

或者,写成单个求和:

的相邻的差异这种紧凑单个求和是基础更有效的算法:

  1. 排序数组,为了 O(N日志N)
  2. 在计算每个相邻的元素,为了 O(N)的差异
  3. 在迭代的差异每个 NK 序列,并计算上述款项,为了 O(NK)

请注意,第二和第三步可以合并,但与Python您的情况可能会有所不同。

在code:

 高清亲密(差异,K):
  ACC = 0.0
  对(I,V)在历数(差异):
    ACC + =第(i + 1)*(K-第(i + 1))* V
  回报ACC

高清最接近(A,K):
  a.sort()
  N = LEN(一)
  的diff = [A [1 + 1]  -  A [1]为在的xrange I(N-1)]

  min_ind = 0
  MIN_VAL =亲近(差异[0:K-1],K)的

  对于IND中的xrange(1,N-K + 1):
    CL =亲近(差异[IND:的ind + K-1],K)的
    如果Cl  - 浓度MIN_VAL:
      min_ind = IND
      MIN_VAL = CL

  返回[min_ind:min_ind + K]
 

Given an array of integers size N, how can you efficiently find a subset of size K with elements that are closest to each other?

Let the closeness for a subset (x1,x2,x3,..xk) be defined as:

2 <= N <= 10^5

2 <= K <= N

constraints: Array may contain duplicates and is not guaranteed to be sorted.

My brute force solution is very slow for large N, and it doesn't check if there's more than 1 solution:

N = input()
K = input()
assert 2 <= N <= 10**5
assert 2 <= K <= N
a = []
for i in xrange(0, N):
    a.append(input())
a.sort()

minimum = sys.maxint
startindex = 0

for i in xrange(0,N-K+1):
    last = i + K
    tmp = 0
    for j in xrange(i, last):
        for l in xrange(j+1, last):
            tmp += abs(a[j]-a[l])
            if(tmp > minimum):
                break

    if(tmp < minimum):
        minimum = tmp
        startindex = i #end index = startindex + K?

Examples:

N = 7
K = 3
array = [10,100,300,200,1000,20,30]
result = [10,20,30]

N = 10
K = 4
array = [1,2,3,4,10,20,30,40,100,200]
result = [1,2,3,4]

解决方案

Your current solution is O(NK^2) (assuming K > log N). With some analysis, I believe you can reduce this to O(NK).

The closest set of size K will consist of elements that are adjacent in the sorted list. You essentially have to first sort the array, so the subsequent analysis will assume that each sequence of K numbers is sorted, which allows the double sum to be simplified.

Assuming that the array is sorted such that x[j] >= x[i] when j > i, we can rewrite your closeness metric to eliminate the absolute value:

Next we rewrite your notation into a double summation with simple bounds:

Notice that we can rewrite the inner distance between x[i] and x[j] as a third summation:

where I've used d[l] to simplify the notation going forward:

Notice that d[l] is the distance between each adjacent element in the list. Look at the structure of the inner two summations for a fixed i:

j=i+1         d[i]
j=i+2         d[i] + d[i+1]
j=i+3         d[i] + d[i+1] + d[i+2]
...
j=K=i+(K-i)   d[i] + d[i+1] + d[i+2] + ... + d[K-1]

Notice the triangular structure of the inner two summations. This allows us to rewrite the inner two summations as a single summation in terms of the distances of adjacent terms:

total: (K-i)*d[i] + (K-i-1)*d[i+1] + ... + 2*d[K-2] + 1*d[K-1]

which reduces the total sum to:

Now we can look at the structure of this double summation:

i=1     (K-1)*d[1] + (K-2)*d[2] + (K-3)*d[3] + ... + 2*d[K-2] + d[K-1]
i=2                  (K-2)*d[2] + (K-3)*d[3] + ... + 2*d[K-2] + d[K-1]
i=3                               (K-3)*d[3] + ... + 2*d[K-2] + d[K-1]
...
i=K-2                                                2*d[K-2] + d[K-1]
i=K-1                                                           d[K-1]

Again, notice the triangular pattern. The total sum then becomes:

1*(K-1)*d[1] + 2*(K-2)*d[2] + 3*(K-3)*d[3] + ... + (K-2)*2*d[K-2] 
  + (K-1)*1*d[K-1]

Or, written as a single summation:

This compact single summation of adjacent differences is the basis for a more efficient algorithm:

  1. Sort the array, order O(N log N)
  2. Compute the differences of each adjacent element, order O(N)
  3. Iterate over each N-K sequence of differences and calculate the above sum, order O(NK)

Note that the second and third step could be combined, although with Python your mileage may vary.

The code:

def closeness(diff,K):
  acc = 0.0
  for (i,v) in enumerate(diff):
    acc += (i+1)*(K-(i+1))*v
  return acc

def closest(a,K):
  a.sort()
  N = len(a)
  diff = [ a[i+1] - a[i] for i in xrange(N-1) ]

  min_ind = 0
  min_val = closeness(diff[0:K-1],K)

  for ind in xrange(1,N-K+1):
    cl = closeness(diff[ind:ind+K-1],K)
    if cl < min_val:
      min_ind = ind
      min_val = cl

  return a[min_ind:min_ind+K]

这篇关于查找以K元素是最接近海誓山盟子集的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆