有效地对元组列表进行分组 [英] Group list of tuples efficiently

查看:114
本文介绍了有效地对元组列表进行分组的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有很多元组,例如 [(1,2 ,,(1,3),(1,4),(2,1),(2,3)] 等。我想转换有效地转换为 [(1,[1,2,3,4]),(2,[1,3])] 。我将每个元组的第一个元素分组为元组,即(1,2),(1,3),(1,4)变为( 1,[2,3,4])(另请参见下面的Haskell版本)。我怀疑这可以一次完成吗? 输入列表始终是有序的。

I have a large list of tuples e.g. [ (1,2), (1,3), (1,4), (2,1), (2,3) ] etc. I want to convert it to [ (1, [1,2,3,4]), (2, [1,3] ) ] efficiently. I am grouping tuples by the first element of each tuple i.e. (1,2), (1,3), (1,4) becomes (1, [2,3,4]) (also see the Haskell version below). I doubt that this can be done in one pass? The input list is always ordered.

python 中尝试使用 defaultdict 我认为这是自然的解决方案,无需重新发明轮子。它运作良好,但不保留键的顺序。一种解决方案是使用有序 defaultdict 作为

In python in tried using defaultdict which I thought was the natural solution without reinventing the wheel. It works well but it does not preserve the order of keys. One solution is to use ordered defaultdict as explained here.

无论如何,我想知道独立于语言的高效解决方案。我当前的解决方案需要两次通过并一次调用列表中的 set()

Anyway, I'd like to know the language independent and efficient solution to this problem. My current solution requires two passes and one call to set( ) on a list.

更新

我正在考虑实施以下Haskell版本:

I am thinking of implementing following Haskell version:

a = [ (1,2), (1,3), (1,4), (2,1), (2,3) ] 
b = groupBy (\ x y -> fst x == fst y ) 
b 
[[(1,2),(1,3),(1,4)],[(2,1),(2,3)]]  
map (\x -> (fst .head $ x, map snd x ) ) b 
[(1,[2,3,4]),(2,[1,3])]



答案的表现



我实现了两个答案(冷速和pm2ring)。在中等大小的列表(最多10 ^ 4个元素)上,PM2环形解决方案速度更快;尺寸为10 ^ 5时,两者都花相同的时间,在较大的清单上,COLDSPEED开始获胜。下面是数字(使用python3)。

Performance of answers

I implemented two answers (coldspeed and pm2ring). On moderate size lists (upto 10^4 elements) PM2 ring solution is faster; at size 10^5, both takes same time, on larger list COLDSPEED starts winning. Below are the numbers (with python3).

第一列是列表中的条目数,第二列是 coldspeed ,第三列显示了 pm2环解决方案所花费的时间。所有时间均以秒为单位。

First column is number of entries in list, second is time taken by coldspeed and third column has time taken by pm2 ring solutions. All times are in second.

10 0.0001 0.0000
100 0.0001 0.0000
1000 0.0005 0.0001
10000 0.0044 0.0014
100000 0.0517 0.0452
1000000 0.5579 1.5249

脚本在这里 http://github.com/dilawar/playground/raw/master/ Python / so_group_tuple.py

PM 2Ring 根据Ashwini的建议,解决方案甚至更快(大约3到5倍)。

PM 2Ring solution is even faster (roughly 3x - 5x) with Ashwini's suggestions.

10 4.887580871582031e-05 1.2636184692382812e-05
100 0.00010132789611816406 2.0742416381835938e-05
1000 0.0005109310150146484 0.000110626220703125
10000 0.004467487335205078 0.0009067058563232422
100000 0.05056118965148926 0.017516136169433594
1000000 0.6100358963012695 0.26450490951538086
10000000 6.092756509780884 2.8253660202026367

$

使用PYPY



有些混合的结果。最后一列是第2列和第3列的比率。

With PYPY

Somewhat mixed results. Last column is ratio of column 2 and column 3.

pypy so_group_tuple.py 
(10, [1.6927719116210938e-05, 3.409385681152344e-05], 0.4965034965034965)
(100, [4.601478576660156e-05, 8.296966552734375e-05], 0.5545977011494253)
(1000, [0.010258913040161133, 0.0019040107727050781], 5.388054094665665)
(10000, [0.0002448558807373047, 0.00021600723266601562], 1.1335540838852096)
(100000, [0.002658843994140625, 0.0018231868743896484], 1.45834967961292)
(1000000, [0.0833890438079834, 0.02979302406311035], 2.7989452709245284)
(10000000, [1.0556740760803223, 0.6789278984069824], 1.5549133841124023)

我要使用 PM 2Ring 解决方案,因为它更快,直到列表大小为10 ^ 5

I am going with PM 2Ring solution since it is much faster till list size 10^5.

推荐答案

您可以使用 itertools.groupby ,并使用 zip 重新排列收集到的组中的数据:

You can do this with itertools.groupby, and using zip to rearrange the data from the collected groups:

from itertools import groupby
from operator import itemgetter

a = [(1, 2), (1, 3), (1, 4), (2, 1), (2, 3)]
b = [(k, list(list(zip(*g))[1])) for k, g in groupby(a, itemgetter(0))]
print(b)

输出

[(1, [2, 3, 4]), (2, [1, 3])]






该列表组合有点密集。这是使用传统的 for 循环的一种变体,该循环打印中间结果,以便于更轻松地了解正在发生的事情。


That list comp is a bit dense. Here's a variation using a traditional for loop that prints an intermediate result to make it a little easier to see what's going on.

b = []
for k, g in groupby(a, itemgetter(0)):
    t = list(zip(*g))
    print(t)
    b.append(list(t[1]))

print('Output', b)

输出

[(1, 1, 1), (2, 3, 4)]
[(2, 2), (1, 3)]
Output [[2, 3, 4], [1, 3]]






正如Ashwini Chaudhary在评论中提到的,在其中嵌套另一个list comp可以使代码很多更具可读性,因为它可以避免多次调用,因此效率也可能更高。


As Ashwini Chaudhary mentions in the comments, nesting another list comp in there makes the code much more readable, it's probably also more efficient, since it avoids a couple of calls.

b = [(k, [x for _, x in g]) for k, g in groupby(a, itemgetter(0))]

这篇关于有效地对元组列表进行分组的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆