有效地从无序集合列表中删除与订单无关的重复项 [英] Efficiently remove duplicates, order-independent, from list of unordered sets

查看:99
本文介绍了有效地从无序集合列表中删除与订单无关的重复项的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

以下列表具有重复的子列表.但是,它们的顺序不同:

The following list has duplicated sublists. However, they are in different order:

l1 = [['The', 'quick', 'brown', 'fox'], ['hi', 'there'], ['jumps', 'over', 'the', 'lazy', 'dog'], ['there', 'hi'], ['jumps', 'dog', 'over','lazy', 'the']]

如何删除它们以获得:

l1 = [['The', 'quick', 'brown', 'fox'], ['hi', 'there'], ['jumps', 'over', 'the', 'lazy', 'dog']]

我试图:

[list(i) for i in set(map(tuple, l1))]

尽管如此,我不知道这是否是处理大型列表的最快方法,而且我的尝试没有按预期进行.想知道如何有效地删除它们吗?

Nevertheless, I do not know if this is the fastest way of doing it for large lists, and my attempt is not working as desired. Any idea of how to remove them efficiently?

推荐答案

这有点棘手.您想从冻结的计数器中删除字典,但是计数器在Python中不可哈希.为了使渐进复杂度稍有下降,您可以使用已排序的元组代替冻结计数器:

This one is a little tricky. You want to key a dict off of frozen counters, but counters are not hashable in Python. For a small degradation in the asymptotic complexity, you could use sorted tuples as a substitute for frozen counters:

seen = set()
result = []
for x in l1:
    key = tuple(sorted(x))
    if key not in seen:
        result.append(x)
        seen.add(key)

单线的相同想法如下所示:

The same idea in a one-liner would look like this:

[*{tuple(sorted(k)): k for k in reversed(l1)}.values()][::-1]

这篇关于有效地从无序集合列表中删除与订单无关的重复项的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆