如何最佳地计数一个python列表中的元素 [英] how to optimally count elements in a python list

查看:116
本文介绍了如何最佳地计数一个python列表中的元素的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这几乎与 这里

我有一个列表(在0到12之间大约有10个整数),例如:

  the_list = [5,7,6,5,5,4,4,7,5,4] 

我想创建一个函数,返回由第一个元素排序的元组列表(项目,计数)例如

  output = [(4,3),(5,5),(6,1) 2)] 

到目前为止,我使用了:

  def dupli(the_list):
return [(item,the_list.count(item) / code>

但是我调用这个函数几乎花了一个毫秒的时间,我需要使它像我(python)一样快。所以我的问题:如何使这个函数减少时间comsuming? (内存怎么样?)



我玩过一会儿,但没有什么明显的:

 从timeit导入定时器为T 
number = 10000
setup =the_list = [5,7,6,5,5,4,4,7 ,5,4]

stmt =[item,the_list.count(item))for item in sorted(set(the_list))]
T(stmt = stmt, setup = setup).timeit(number = number)

Out [230]:0.058799982070922852

stmt =L = []; \\\
for item in sorted the_list)):\\\
L.append((item,the_list.count(item)))
T(stmt = stmt,setup = setup).timeit(number = number)

Out [233]:0.065041065216064453

stmt =[(item,the_list.count(item))set in item(sorted(the_list))]
T(stmt = stmt,setup = setup).timeit(number = number)

Out [236]:0.098351955413818359


b $ b

感谢

Christophe

解决方案

更改排序方式,节约大约20 %。



更改:

  def dupli 
return [(item,the_list.count(item))for item in sorted(set(the_list))]


b $ b

到这个:

  def dupli(the_list):
count = the_list.count#this优化增加了Sven的注释
result = [(item,count(item))for item in set(the_list)]
result.sort()
返回结果

这更快的原因是 sorted 迭代器必须创建一个临时

编辑:
这是另一种方法,比原来快35%

  def dupli(the_list):
counts = [0,0,0,0,0,0,0 ,0,0,0,0,0,0]
对于in_list中的n:
counts [n] + = 1
return [ (0,1,2,3,4,5,6,7,8,9,10,11,12)如果count [i]]

注意:您可能想随机化 the_list 的值。我的最终版本的 dupli 测试甚至更快与其他随机数据集( import random; the_list = [random.randint(0,12)for i in xrange(10)]


This is almost the same question than here, except that I am asking about the most efficient solution for a sorted result.

I have a list (about 10 integers randomly between 0 and 12), for example:

the_list = [5, 7, 6, 5, 5, 4, 4, 7, 5, 4]

I want to create a function that returns a list of tuples (item, counts) ordered by the first element, for example

output = [(4, 3), (5, 4), (6, 1), (7, 2)]

So far I have used:

def dupli(the_list):
    return [(item, the_list.count(item)) for item in sorted(set(the_list))]

But I call this function almost a millon time and I need to make it as fast as I (python) can. Therefore my question: How to make this function less time comsuming? (what about memory?)

I have played around a bit, but nothing obvious came up:

from timeit import Timer as T
number=10000
setup = "the_list=[5, 7, 6, 5, 5, 4, 4, 7, 5, 4]"

stmt = "[(item, the_list.count(item)) for item in sorted(set(the_list))]"
T(stmt=stmt, setup=setup).timeit(number=number)

Out[230]: 0.058799982070922852

stmt = "L = []; \nfor item in sorted(set(the_list)): \n    L.append((item, the_list.count(item)))"
T(stmt=stmt, setup=setup).timeit(number=number)

Out[233]: 0.065041065216064453

stmt = "[(item, the_list.count(item)) for item in set(sorted(the_list))]"
T(stmt=stmt, setup=setup).timeit(number=number)

Out[236]: 0.098351955413818359

Thanks
Christophe

解决方案

Change where you sort for a savings of about 20%.

Change this:

def dupli(the_list):
    return [(item, the_list.count(item)) for item in sorted(set(the_list))]

To this:

def dupli(the_list):
    count = the_list.count # this optimization added courtesy of Sven's comment
    result = [(item, count(item)) for item in set(the_list)]
    result.sort()
    return result

The reason this is faster is that the sorted iterator must create a temporary list, whereas sorting the result sorts in place.

edit: Here's another approach that is 35% faster than your original:

def dupli(the_list):
    counts = [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
    for n in the_list:
        counts[n] += 1
    return [(i, counts[i]) for i in (0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12) if counts[i]]

Note: You may want to randomize the values for the_list. My final version of dupli tests even faster with other random data sets (import random; the_list=[random.randint(0,12) for i in xrange(10)])

这篇关于如何最佳地计数一个python列表中的元素的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆