如何计算列表的最小不公平总和 [英] how to calculate the minimum unfairness sum of a list

查看:56
本文介绍了如何计算列表的最小不公平总和的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

Edit1 ::

对于这个问题的未来访问者,我到目前为止的结论是,

就是方差不公平金额完全无关(它们与 STRONGLY 相关)这意味着在很多整数列表中,具有最小方差的列表不一定总是具有<$ c $的列表c>最小不公平金额。如果您想知道为什么,我实际上是在数学的堆栈交换 HERE 中找到一个单独的问题数学家为我证明了xD(值得一看,因为这是出乎意料的)。

For future visitors of this question, the conclusions I have till now are,
that variance and unfairness sum are not PERFECTLY related (they are STRONGLY related) WHICH MEANS that among a lots of lists of integers, a list with minimum variance DOESN'T ALWAYS HAS TO BE the list with minimum unfairness sum. IF YOU WANT TO KNOW WHY, I ACTUALLY ASKED THIS AS A SEPARATE QUESTION IN MATH'S STACK-EXCHANGE HERE where one of the mathematicians proved it for me xD (and it's worth taking a look, 'cause it was unexpected)

就整个问题而言,您可以阅读archer&下面的Attersson(仍在尝试找出一种简单的方法来实现这一目标-尽管现在应该还不算遥远)

As far as the question is concerned overall, you can read answers by archer & Attersson below (still trying to figure out a naive approach to carry this out - it shouldn't be far by now though)

原始问题从这里开始

我试图总结问题陈述,例如:

I have tried to summarize the problem statement something like this::

给出 n k 和一个数组(列表) arr 其中 n = len(arr) k 整数 set(1,n)中的 integer

Given n, k and an array(a list) arr where n = len(arr) and k is an integer in set (1, n) inclusive.

对于数组(或列表) myList 最小不公平总和 被定义为所有可能对之间的绝对差额(组合的总和 myList

For an array (or list) myList, The MINIMUM UNFAIRNESS SUM is defined as the sum of the absolute differences between all possible pairs (combinations with 2 elements each) in myList.

要解释:如果 mylist = [1、2、5、5、6] ,然后最小不公平金额 MUS )[请注意,元素被认为是唯一由列表中的索引而不是其]

To explain: if mylist = [1, 2, 5, 5, 6] then Minimum unfairness sum (MUS) [Please note that elements are considered unique by their index in list not their values]

MUS = |1-2| + |1-5| + |1-5| + |1-6| + |2-5| + |2-5| + |2-6| + |5-5| + |5-6| + |5-6|

如果您确实需要查看问题说明,它是这里

If you actually need to look at the problem statement, It's HERE

我的目标 $ b在给定 n,k,arr (如上所述)的情况下,从所有$ b中找出最小不公平总和 CONSTRAINT 可能会导致子数组的不公平总和,每个 len(sub array)= k [这是一件很容易的事情,我相信:)]

My Objective given n, k, arr(as described above), find the Minimum Unfairness Sum out of all of the unfairness sums of sub arrays possible with a CONSTRAINT that each len(sub array) = k [which is a good thing to make our lives easy, I believe :) ]

我已经尝试过的方法

好,这里有很多要添加的内容,所以我会尽量做到简短。

well, there is a lot to be added in here, so I'll try to be as short as I can.

我的第一种方法 就是我在这里使用 itertools.combinations 获取所有可能的组合,并使用 statistics.variance 检查其价差数据(是的,我知道我很烂)。

在您看到下面的代码之前,请您考虑一下这些方差和最小不正和SUM( MUS )完全相关(我知道它们之间有很强的相关性),即子数组的最小方差为。 code>必须是具有 MUS ??

My First approach was this where i used itertools.combinations to get all the possible combinations and statistics.variance to check its spread of data (yeah, I know I'm a mess).
Before you see the code below, DO YOU THINK THESE VARIANCE AND MINIMUM UNFAIRNESS SUM(MUS) ARE PERFECTLY RELATED (i know they are strongly related) i.e. the sub array with minimum variance HAS TO BE the sub array with MUS??

的子数组,您只需要检查 LetMeDoIt(n,k,arr)函数(如果您需要 MCVE ,请检查下面的第二个代码段)

YOU ONLY HAVE TO CHECK THE LetMeDoIt(n, k, arr) FUNCTION (If you need MCVE, check the second code snippet below)

from itertools import combinations as cmb
from statistics import variance as varn

def LetMeDoIt(n, k, arr):
    v = []
    s = []
    subs = [list(x) for x in list(cmb(arr, k))]  # getting all sub arrays from arr in a list

    i = 0
    for sub in subs:
        if i != 0:
            var = varn(sub)  # the variance thingy
            if float(var) < float(min(v)):
                v.remove(v[0])
                v.append(var)
                s.remove(s[0])
                s.append(sub)
            else:
                pass

        elif i == 0:
            var = varn(sub)
            v.append(var)
            s.append(sub)
            i = 1

    final = []
    f = list(cmb(s[0], 2))  # getting list of all pairs (after determining sub array with least MUS)
    
    for r in f:
        final.append(abs(r[0]-r[1]))  # calculating the MUS in my messy way

    return sum(final)

上面的代码对于 n< 30 ,但超出此范围又引发了 MemoryError
在Python聊天中,凯文建议我尝试使用生成器,该生成器内存有效(的确如此),但是由于生成器还在我们 iterate 上动态生成这些组合时,假设n = 50耗时140小时(:/),k = 8

The above code works fine for n<30 but raised a MemoryError beyond that. In Python chat, Kevin suggested me to try generator which is memory efficient (it really is), but as generator also generates those combination on the fly as we iterate over them, it was supposed to take over 140 hours (:/) for n=50, k=8 as estimated.

我在 SO上发布了相同的问题此处(您可能想要看一下以正确理解我-它进行了讨论并通过融合给出了答案,这使我进入了第二种方法-一种更好的方法(我应该说融合方法xD)。

I POSTED THE SAME AS A QUESTION ON 'SO' HERE (you might wanna have a look to understand me properly - it has discussions and an answer by fusion which takes me to my second approach - a better one(i should say fusion's approach xD)).

第二种方法

from itertools import combinations as cmb

def myvar(arr):   # a function to calculate variance
    l = len(arr)
    m = sum(arr)/l
    return sum((i-m)**2 for i in arr)/l

def LetMeDoIt(n, k, arr):
    sorted_list = sorted(arr)  # i think sorting the array makes it easy to get the sub array with MUS quickly
    variance = None
    min_variance_sub = None
    
    for i in range(n - k + 1):
        sub = sorted_list[i:i+k]
        var = myvar(sub)
        if variance is None or var<variance:
            variance = var
            min_variance_sub=sub
            
    final = []
    f = list(cmb(min_variance_sub, 2))  # again getting all possible pairs in my messy way

    for r in f:
        final.append(abs(r[0] - r[1]))

    return sum(final)

def MainApp():
    n = int(input())
    k = int(input())

    arr = list(int(input()) for _ in range(n))

    result = LetMeDoIt(n, k, arr)

    print(result)    

if __name__ == '__main__':
    MainApp()

此代码非常适合 n至1000 (可能更多),但由于时间而终止out (5秒是线法官:/)的n超过 10000 (最大的测试用例为 n = 100000 )。

This code works perfect for n up to 1000 (maybe more), but terminates due to time out (5 seconds is the limit on online judge :/ ) for n beyond 10000 (the biggest test case has n=100000).

=====
在给定的时间限制(5秒)内,您将如何处理此问题来处理所有测试用例? (问题列在算法动态编程下)

(用于您的参考文献可以看看

(for your references you can have a look on


  1. 其他候选人就此问题成功提交了(py3,py2,C ++,java)-以便您可以
    来为我和未来的访问者解释这种方法

  2. 社论,由问题解决者解释如何处理问题

  3. 解决问题的人自己的解决方案代码( py2,C ++)。

  4. 输入数据(测试例)和预期输出

  1. successful submissions(py3, py2, C++, java) on this problem by other candidates - so that you can explain that approach for me and future visitors)
  2. an editorial by the problem setter explaining how to approach the question
  3. a solution code by problem setter himself (py2, C++).
  4. Input data (test cases) and expected output

感谢您的帮助或建议:)

THANK YOU FOR ANY HELP OR SUGGESTION :)

推荐答案

I看到这个问题仍然没有完整的答案。我会写一条正确的算法的轨道,它将通过评审。我不会为了尊重Hackerrank挑战的目的而编写代码。由于我们有可行的解决方案。

I see this question still has no complete answer. I will write a track of a correct algorithm which will pass the judge. I will not write the code in order to respect the purpose of the Hackerrank challenge. Since we have working solutions.


  1. 必须对原始数组进行排序。这具有O(NlogN)

  1. The original array must be sorted. This has a complexity of O(NlogN)

的复杂度,此时您可以检查连续的子数组,因为非连续的子数组会导致更糟的情况。 (或等于但不更好)不公平金额。弓箭手的答案中也对此进行了解释。

At this point you can check consecutive sub arrays as non-consecutive ones will result in a worse (or equal, but not better) "unfairness sum". This is also explained in archer's answer

最后一次检查以找出最小的不公平金额。可以在O(N)中完成。您需要为每个连续的k长子数组计算US。错误是在O(k)中完成的每个步骤都需要重新计算,这将这段代码的复杂性带到O(k * N)。正如您所发表的社论所显示的那样,它可以在O(1)中完成,包括数学公式。它要求在第1步之后对累积数组进行先前的初始化(在O(N)中完成,并且空间复杂度为O(N))。

The last check passage, to find the minimum "unfairness sum" can be done in O(N). You need to calculate the US for every consecutive k-long subarray. The mistake is recalculating this for every step, done in O(k), which brings the complexity of this passage to O(k*N). It can be done in O(1) as the editorial you posted shows, including mathematic formulae. It requires a previous initialization of a cumulative array after step 1 (done in O(N) with space complexity O(N) too).



它可以工作,但由于n <= 10000超时而终止。

It works but terminates due to time out for n<=10000.

(来自评论弓箭手的问题)

(from comments on archer's question)

要解释步骤3,请考虑k =100。要滚动N长数组和第一次迭代,必须从元素0计算子数组的US。照常增加到99,需要100个段落。下一步,您需要为一个与前一个数组仅相差1个元素1到100的子数组计算相同的值,然后再计算2到101个元素,依此类推。
如果有帮助,可将其视为蛇。删除一个块,然后添加一个。
无需执行整个O(k)滚动。只需按照社论中的说明计算数学,然后您将在O(1)中进行。

To explain step 3, think about k = 100. You are scrolling the N-long array and the first iteration, you must calculate the US for the sub array from element 0 to 99 as usual, requiring 100 passages. The next step needs you to calculate the same for a sub array that only differs from the previous by 1 element 1 to 100. Then 2 to 101, etc. If it helps, think of it like a snake. One block is removed and one is added. There is no need to perform the whole O(k) scrolling. Just figure the maths as explained in the editorial and you will do it in O(1).

因此,由于第一种排序,最终的复杂度将渐近地变为O(NlogN)。

So the final complexity will asymptotically be O(NlogN) due to the first sort.

这篇关于如何计算列表的最小不公平总和的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆