Python:使用dict来加快对元组列表的排序 [英] Python: using a dict to speed sorting of a list of tuples
问题描述
由于某种原因,我一直在如何排序这个元组列表问题。 (我以前的一个问题:按任意键排列元组的列表 )
这是一些任意的原始输入:
number_of = 3#或任何
tuple_list = [(n,'a','b','c')for n in xrange(number_of)]#[(0,'a','b','c' )...]
ordering_list = random.sample(range(number_of),number_of)#eg [1,0,2]
排序 tuple_list
由 ordering_list
使用排序:
ordered = sorted(tuple_list,key = lambda t:ordering_list.index(t [0]))
#ordered = [(1,'a','b','c'),(0,'a','b' c'),(2,'a','b','c')]
有一个稍微尴尬的做法似乎要快得多,特别是当 tuple_list
中的元素数量增加时。我创建一个字典,将元组
分解成(tuple [0],tuple [1:])
list_dict
。我使用 ordering_list
作为键检索字典项,然后重新组合(tuple [0],tuple [1:])
到一个元组列表中,使用一个成语我仍然试图完全包围我的头: zip(* [iter(_list)] * x)
其中 x
是由 _list
中的项组成的每个元组的长度。所以我的问题是:是否有一个版本的这种方法是管理拆卸 - 更好地重新组装部分代码?
def gen_key_then_values(key_list,list_dict):
key_list中的键:
values = list_dict [key]
yield key
在值中的n:
yield ñ
list_dict = {T [0]:T [1:用于在tuple_list吨}
有序=拉链(* [gen_key_then_values(ordering_list,list_dict)] * 4)
注意更好的代码,使用来自Steve Jessop的一个明显的评论:
'pre>
list_dict = {T [0]:吨用于tuple_list吨}
有序= [list_dict [K]对于k在ordering_list]
$我的实际项目代码仍然需要为每个(k,['a','b']组合一个元组,c $ c> $ b $ ...])
从 list_dict
中检索的项目,但是我没有理由将这部分代码包含在这里。解决案 在字典打破 tuple_list
的元素分开并没有真正获得你什么,需要创造一堆的价值观更多元组。所有你正在做的是根据他们的第一个元素查找列表中的元素,所以它可能不值得实际拆分:
list_dict = {T [0]:吨用于tuple_list吨}
请注意,这仅适用于如果第一个元素是唯一的,但是如果第一个元素是唯一的,那么 ordering_list
只有这样才是有意义的,所以这很可能。
zip(* [iter(_list)] * 4)
只是将 _list
分组为四,所以给它一个合适的名字,你会不会担心它:
高清fixed_size_groups(N,迭代器):
return zip(* [iter(iterable)] * n)
但是,你实际上并不需要它无论如何:
有序=名单(list_dict [VAL]为VAL在ordering_list)
您的第一个代码缓慢的原因是 ordering_list.index
ordering_list
为 t [0]
,它为每个 T
。所以总是会对列表元素执行(number_of ** 2)/ 2
。 For some reason, I keep having 'how do I sort this list of tuples' questions. (A prior question of mine: sorting list of tuples by arbitrary key).
Here is some arbitrary raw input:
number_of = 3 # or whatever
tuple_list = [(n, 'a', 'b', 'c') for n in xrange(number_of)] # [(0, 'a', 'b', 'c')...]
ordering_list = random.sample(range(number_of), number_of) # e.g. [1, 0, 2]
Sorting tuple_list
by ordering_list
using sorted:
ordered = sorted(tuple_list, key=lambda t: ordering_list.index(t[0]))
# ordered = [(1, 'a', 'b', 'c'), (0, 'a', 'b', 'c'), (2, 'a', 'b', 'c')]
I have a slightly awkward approach which seems to be much faster, especially as the number of elements in the tuple_list
grows. I create a dictionary, breaking the tuple
into (tuple[0], tuple[1:])
items inside dictionary list_dict
. I retrieve the dictionary item using ordering_list
as keys, and then re-assemble the sequence of (tuple[0], tuple[1:])
into a list of tuples, using an idiom I'm still trying to wrap my head around completely: zip(*[iter(_list)] * x)
where x
is the length of each tuple composed of items from _list
. So my question is: is there a version of this approach which is manages the disassemble - reassemble part of the code better?
def gen_key_then_values(key_list, list_dict):
for key in key_list:
values = list_dict[key]
yield key
for n in values:
yield n
list_dict = {t[0]: t[1:] for t in tuple_list}
ordered = zip(*[gen_key_then_values(ordering_list, list_dict)] * 4)
NOTE BETTER CODE, using an obvious comment from Steve Jessop below:
list_dict = {t[0]: t for t in tuple_list}
ordered = [list_dict[k] for k in ordering_list]
My actual project code still requires assembling a tuple for each (k, ['a', 'b' ...])
item retrieved from the list_dict
but there was no reason for me to include that part of the code here.
Breaking the elements of tuple_list
apart in the dictionary doesn't really gain you anything and requires creating a bunch more tuples for the values. All you're doing is looking up elements in the list according to their first element, so it's probably not worth actually splitting them:
list_dict = { t[0] : t for t in tuple_list }
Note that this only works if the first element is unique, but then the ordering_list
only makes sense if the first element is unique, so that's probably OK.
zip(*[iter(_list)] * 4)
is just a way of grouping _list
into fours, so give it a suitable name and you won't have to worry about it:
def fixed_size_groups(n, iterable):
return zip(*[iter(iterable)] * n)
But all things considered you don't actually need it anyway:
ordered = list(list_dict[val] for val in ordering_list)
The reason your first code is slow, is that ordering_list.index
is slow -- it searches through the ordering_list
for t[0]
, and it does this once for each t
. So in total it does (number_of ** 2) / 2
inspections of a list element.
这篇关于Python:使用dict来加快对元组列表的排序的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!