快速删除一个列表中的连续重复项和另一个列表中的对应项 [英] Fast removal of consecutive duplicates in a list and corresponding items from another list

查看:198
本文介绍了快速删除一个列表中的连续重复项和另一个列表中的对应项的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的问题类似于此先前的SO问题. 我有两个非常大的数据列表(近2000万个数据点),其中包含许多连续的重复项.我想删除连续的重复项,如下所示:

My question is similar to this previous SO question. I have two very large lists of data (almost 20 million data points) that contain numerous consecutive duplicates. I would like to remove the consecutive duplicate as follows:

list1 = [1,1,1,1,1,1,2,3,4,4,5,1,2]  # This is 20M long!
list2 = ...  # another list of size len(list1), also 20M long!
i = 0
while i < len(list)-1:
    if list[i] == list[i+1]:
        del list1[i]
        del list2[i]
    else:
        i = i+1

第一个列表的输出应为[1, 2, 3, 4, 5, 1, 2]. 不幸的是,这很慢,因为删除列表中的元素本身就是很慢的操作.有什么办法可以加快这个过程?请注意,如上述代码所示,我还需要跟踪索引i,以便可以删除list2中的相应元素.

And the output should be [1, 2, 3, 4, 5, 1, 2] for the first list. Unfortunately, this is very slow since deleting an element in a list is a slow operation by itself. Is there any way I can speed up this process? Please note that, as shown in the above code snipped, I also need to keep track of the index i so that I can remove the corresponding element in list2.

推荐答案

Python具有此 groupby :

Python has this groupby in the libraries for you:

>>> list1 = [1,1,1,1,1,1,2,3,4,4,5,1,2]
>>> from itertools import groupby
>>> [k for k,_ in groupby(list1)]
[1, 2, 3, 4, 5, 1, 2]

您可以使用keyfunc参数对其进行调整,以同时处理第二个列表.

You can tweak it using the keyfunc argument, to also process the second list at the same time.

>>> list1 = [1,1,1,1,1,1,2,3,4,4,5,1,2]
>>> list2 = [9,9,9,8,8,8,7,7,7,6,6,6,5]
>>> from operator import itemgetter
>>> keyfunc = itemgetter(0)
>>> [next(g) for k,g in groupby(zip(list1, list2), keyfunc)]
[(1, 9), (2, 7), (3, 7), (4, 7), (5, 6), (1, 6), (2, 5)]

如果您想再次将这些对拆分成单独的序列:

If you want to split those pairs back into separate sequences again:

>>> zip(*_)  # "unzip" them
[(1, 2, 3, 4, 5, 1, 2), (9, 7, 7, 7, 6, 6, 5)]

这篇关于快速删除一个列表中的连续重复项和另一个列表中的对应项的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆