从非唯一项目列表中获得唯一组合,更快? [英] Getting unique combinations from a non-unique list of items, FASTER?

查看:80
本文介绍了从非唯一项目列表中获得唯一组合,更快?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

首先,我能够做到,但是我对速度不满意。

First off, I am able to do it but I am not happy with the speed.

我的问题是,有没有更好,更快的方法?

My question is, Is there a better, faster way of doing this?

我有一个看起来像这样的项目列表:

I have a list of items looking like this:

[(1,2), (1,2), (4,3), (7,8)]

我需要获得所有唯一的组合。例如,两个项目的唯一组合为:

And I need to get all the unique combinations. For example, the unique combinations of 2 items would be:

[(1,2), (1,2)], [(1,2), (4,3)], [(1,2), (7,8)], [(4,3), (7,8)]

使用itertools.combinations之后,由于重复操作,我得到的收益远远超过此。例如,我得到每个包含(1,2)的列表两次。如果创建一组这些组合,则会得到唯一的组合。
当原始列表有80个元组并且我想要其中有6个项目的组合时,问题就来了。获得该设置需要30秒钟以上。如果我能记下来这个数字,我会感到非常高兴。

After using itertools.combinations I get a lot more than that because of duplicates. For example, I get every list containing (1,2) twice. If I create a set of these combinations I get the unique ones. The problem comes when the original list has 80 tuples and I want combinations with 6 items in them. Getting that set takes more than 30 seconds. If I can get that number down I would be very happy.

我知道组合的数量很大,因此创建集合很耗时。但是我仍然希望有一个可以以某种方式优化该过程的库,从而加快了速度。

I am aware that the number of combinations is huge and that's why creating the set is time-consuming. But I am still hoping that there is a library that has optimized the process in some way, speeding it up a bit.

从所有方面我发现我只测试了前10000个左右的组合。因为在某些情况下,所有连击的处理方式可能太多,所以我真的不想在它们上花费太多时间,因为还有其他测试需要完成。

It might be important to note that from all the combinations I find I test out only the first 10000 or so. Because in some cases all combos can be waay too much to process so I don't really want to spend too much time on them as there are other tests to be done too.

这是我现在拥有的示例:

This is a sample of what I have now:

from itertools import combinations

ls = [list of random NON-unique sets (x,y)]
# ls = [(1,2), (1,2), (4,3), (7,8)]  # example
# in the second code snipped it is shown how I generate ls for testing

all_combos = combinations(ls, 6)
all_combos_set = set(all_combos)

for combo in all_combos_set:
  do_some_test_on(combo)

如果要测试它出来..这是我用来测试不同方法速度的方法:

In case you want to test it out .. here is what I use for testing the speed of different methods:

def main3():
    tries = 4
    elements_in_combo = 6
    rng = 90
    data = [0]*rng
    for tr in range(tries):
        for n in range(1, rng):
            quantity = 0
            name = (0,0)
            ls = []
            for i in range(n):
                if quantity == 0:
                    quantity = int(abs(gauss(0, 4)))
                    if quantity != 0:
                        quantity -= 1
                    name = (randint(1000,7000), randint(1000,7000))
                    ls.append(name)
                else:
                    quantity -= 1
                    ls.append(name)

            start_time = time.time()
            all_combos = combinations(ls, elements_in_combo)
            all_combos = set(all_combos)

            duration = time.time() - start_time
            data[n] += duration
            print(n, "random files take", duration, "seconds.")

            if duration > 30:
                break

    for i in range(rng):
        print("average duration for", i, "is", (data[i]/tries), "seconds.")


推荐答案

最初提出的问题是有更好,更快的方法吗?实际上有两个问题:

The originally asked question "is there a better, faster way of doing this?" has actually two questions in it:


  • 有没有更快的方法?

  • 有没有更好的方法?

我想缩小对是否有更快的方法?这个问题的答案。到:

I would like to narrow the answer to the question "Is there a faster way?" to:

是否有更快的方法从列表中删除重复项,如下所示:

Is there a FASTER way of removing duplicates from a list as doing it as follows:


lstWithUniqueElements = list(set(lstWithDuplicates))

lstWithUniqueElements = list(set(lstWithDuplicates))

据我所知,没有更快的方法...

To my knowledge, there is no faster way ...

现在,让我们集中讨论问题的第二部分(有没有更好的方法? 。通常很难回答这个问题,但是这里并不需要讨论,因为问题(引用)的作者已经明确指出了更好的方法:

Now let's concentrate more on the second part of the question ( "Is there a better way?" ). It is usually very hard and needs much discussion to answer such kind of question, but it will be not the case here, because what a better way is, was already clearly stated by the author of the question (citation):


我很想使用生成器函数。 itertools Combines()
本身是可迭代的,而不是列表或集合,因此,如果我弄清楚如何以
来产生唯一的组合,那就太好了。

I'd love to use a generator function. The itertools combinations() itself is an iterable and not a list or set, so if I figure out how to yield the unique combinations that'd be great.

所以这里是:

def uniqueCombinations(lstList, comboSize): 
    from itertools import combinations
    lstList.sort()
    allCombos = combinations(lstList, comboSize)
    setUniqueCombos = set()
    for comboCandidate in allCombos:
        if comboCandidate in setUniqueCombos:
            continue
        yield comboCandidate
        setUniqueCombos.add(comboCandidate)

就是这样...



在这里也许还有一件事情值得一提。如果从其生成的列表不仅具有唯一性,而且具有相同值的多个元素在这样的某些特殊情况下不起作用,则由选择问题的方法的作者来获取唯一组合的方法:

That's it ...


One more thing is maybe worth to mention here. The by the author of the question chosen method of getting unique combinations in case the list they are generated from has not only unique but also multiple elements with same value doesn't work in some special cases like this one:

set(combinations(['a','a','b','a'], 2)) gives: {('a', 'b'), ('b', 'a'), ('a', 'a')}
uniqueCombinations(['a','a','b','a'],2) gives: {('a', 'b'), ('a', 'a')}

之间,在stackoverflow上有一个纯Python函数,如上面提供的那样,它既快又慢。如何更快,更慢?有关详细信息,请参见 HERE

In between there is a pure Python function available here on stackoverflow which is both faster and slower as this one provided above. How can it be faster AND slower? See HERE for details.

这篇关于从非唯一项目列表中获得唯一组合,更快?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆