Python:使用dict来加快对元组列表的排序 [英] Python: using a dict to speed sorting of a list of tuples

查看:127
本文介绍了Python:使用dict来加快对元组列表的排序的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

由于某种原因,我一直在如何排序这个元组列表问题。 (我以前的一个问题:按任意键排列元组的列表



这是一些任意的原始输入:

  number_of = 3#或任何
tuple_list = [(n,'a','b','c')for n in xrange(number_of)]#[(0,'a','b','c' )...]
ordering_list = random.sample(range(number_of),number_of)#eg [1,0,2]

排序 tuple_list ordering_list 使用排序:

  ordered = sorted(tuple_list,key = lambda t:ordering_list.index(t [0]))
#ordered = [(1,'a','b','c'),(0,'a','b' c'),(2,'a','b','c')]

有一个稍微尴尬的做法似乎要快得多,特别是当 tuple_list 中的元素数量增加时。我创建一个字典,将元组分解成(tuple [0],tuple [1:]) list_dict 。我使用 ordering_list 作为键检索字典项,然后重新组合(tuple [0],tuple [1:])到一个元组列表中,使用一个成语我仍然试图完全包围我的头: zip(* [iter(_list)] * x)其中 x 是由 _list 中的项组成的每个元组的长度。所以我的问题是:是否有一个版本的这种方法是管理拆卸 - 更好地重新组装部分代码?

  def gen_key_then_values(key_list,list_dict):
key_list中的键:
values = list_dict [key]
yield key

在值中的n:
yield ñ

list_dict = {T [0]:T [1:用于在tuple_list吨}
有序=拉链(* [gen_key_then_values(ordering_list,list_dict)] * 4)

注意更好的代码,使用来自Steve Jessop的一个明显的评论:



'pre> list_dict = {T [0]:吨用于tuple_list吨}
有序= [list_dict [K]对于k在ordering_list]
(k,['a','b']组合一个元组,c $ c>

$ b $ ...]) list_dict 中检索的项目,但是我没有理由将这部分代码包含在这里。

解决案

在字典打破 tuple_list 的元素分开并没有真正获得你什么,需要创造一堆的价值观更多元组。所有你正在做的是根据他们的第一个元素查找列表中的元素,所以它可能不值得实际拆分:

  list_dict = {T [0]:吨用于tuple_list吨} 

请注意,这仅适用于如果第一个元素是唯一的,但是如果第一个元素是唯一的,那么 ordering_list 只有这样才是有意义的,所以这很可能。



zip(* [iter(_list)] * 4)只是将 _list 分组为四,所以给它一个合适的名字,你会不会担心它:

 高清fixed_size_groups(N,迭代器): 
return zip(* [iter(iterable)] * n)

但是,你实际上并不需要它无论如何:

 有序=名单(list_dict [VAL]为VAL在ordering_list)

您的第一个代码缓慢的原因是 ordering_list.index ordering_list t [0] ,它为每个 T 。所以总是会对列表元素执行(number_of ** 2)/ 2


For some reason, I keep having 'how do I sort this list of tuples' questions. (A prior question of mine: sorting list of tuples by arbitrary key).

Here is some arbitrary raw input:

number_of = 3  # or whatever
tuple_list = [(n, 'a', 'b', 'c') for n in xrange(number_of)]  # [(0, 'a', 'b', 'c')...]
ordering_list = random.sample(range(number_of), number_of)  # e.g. [1, 0, 2]

Sorting tuple_list by ordering_list using sorted:

ordered = sorted(tuple_list, key=lambda t: ordering_list.index(t[0]))
# ordered = [(1, 'a', 'b', 'c'), (0, 'a', 'b', 'c'), (2, 'a', 'b', 'c')]

I have a slightly awkward approach which seems to be much faster, especially as the number of elements in the tuple_list grows. I create a dictionary, breaking the tuple into (tuple[0], tuple[1:]) items inside dictionary list_dict. I retrieve the dictionary item using ordering_list as keys, and then re-assemble the sequence of (tuple[0], tuple[1:]) into a list of tuples, using an idiom I'm still trying to wrap my head around completely: zip(*[iter(_list)] * x) where x is the length of each tuple composed of items from _list. So my question is: is there a version of this approach which is manages the disassemble - reassemble part of the code better?

def gen_key_then_values(key_list, list_dict):
    for key in key_list:
        values = list_dict[key]
        yield key

        for n in values:
            yield n

list_dict = {t[0]: t[1:] for t in tuple_list}
ordered = zip(*[gen_key_then_values(ordering_list, list_dict)] * 4)

NOTE BETTER CODE, using an obvious comment from Steve Jessop below:

list_dict = {t[0]: t for t in tuple_list}
ordered = [list_dict[k] for k in ordering_list]

My actual project code still requires assembling a tuple for each (k, ['a', 'b' ...]) item retrieved from the list_dict but there was no reason for me to include that part of the code here.

解决方案

Breaking the elements of tuple_list apart in the dictionary doesn't really gain you anything and requires creating a bunch more tuples for the values. All you're doing is looking up elements in the list according to their first element, so it's probably not worth actually splitting them:

list_dict = { t[0] : t for t in tuple_list }

Note that this only works if the first element is unique, but then the ordering_list only makes sense if the first element is unique, so that's probably OK.

zip(*[iter(_list)] * 4) is just a way of grouping _list into fours, so give it a suitable name and you won't have to worry about it:

def fixed_size_groups(n, iterable):
    return zip(*[iter(iterable)] * n)

But all things considered you don't actually need it anyway:

ordered = list(list_dict[val] for val in ordering_list)

The reason your first code is slow, is that ordering_list.index is slow -- it searches through the ordering_list for t[0], and it does this once for each t. So in total it does (number_of ** 2) / 2 inspections of a list element.

这篇关于Python:使用dict来加快对元组列表的排序的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆