Python：使用dict comprehension / generator在列表中计数出现次数 [英] Python: count occurrences in a list using dict comprehension/generator

查看：519 发布时间：2017/5/21 23:32:57 python performance dictionary generator

本文介绍了Python：使用dict comprehension / generator在列表中计数出现次数的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我想编写一些测试来分析python中不同操作的效率，即字典理解与dict生成器的比较。

为了测试出来，我以为我会尝试一个简单的例子：使用字典计算列表中的单词数量。

现在我知道你可以使用集合.Counter （根据这里的答案：如何计算Python中的列表项的出现？），但我的目标是测试性能记忆。

一个长手的方法是在基本循环中进行。

 从pprint import pprint 
 
＃读取一些文本创建示例数据
与open（'text.txt'）为f：
 words = f.read（）。split（）
 
 dict1 = {} 
 for w in words：
 if not dict1.get（w）：
 dict1 [w] = 1 
 else：
 dict1 [w] + = 1 
 pprint（dict1）

结果：

  {'a'：62，
'aback'：1，
'able'：1，
'废除'：2，
'about'：6，
'accept'：1，
'accepted'：1，
'accord'：1，
'根据'：1，
'across'：1，
 ...

然后我有一个尝试在字典理解中做同样的事情：

  dict2 = {w：1如果不是dict2.get（w）else dict2.get（w）+ 1 
 for w in words}

我有一个错误：

  NameError：全局名'dict2'未定义

我尝试定义前提：

 code> dict2 = {} 
 dict2 = {w：1 if not dict2.get（w）else dict2.get（w）+ 1 
 for w in words} 
 pprint （dict2）

但当然，这些计数都设置为1 ：

  {'a'：1，
'aback'：1，
'able' 1，
'abolished'：1，
'about'：1，
'accept'：1，
'accepted'：1，
' 1，
'根据'：1，
'跨'：1，
 ...

我有一个类似于dict理解的问题：

  dict3 = dict（（w，1 if不是dict2.get（w）else dict2.get（w）+ 1）
 for w in words）

所以我的问题是：如何最有效地使用字典理解/生成器来计算列表中的出现次数？

 
 
  更新：@Rawing提出了一种替代方法 {word：word.count（word）for set in set（words）} ，但这将规避我正在尝试的机制
解决方案
使用dict-comprehension，您无法有效地（至少在内存方面）这样做，因为那么你'二必须跟踪另一个字典中的当前计数，即更多的内存消耗。以下是您可以使用dict-comprehension（不推荐使用： - ））的方法：
 >> ; words = list（'asdsadDASDFASCSAASAS'）
>>> dct = {} 
>>>> {w：1如果不在dct而不是dct.update（{w：1}）
 else dct [w] + 1 
如果不是dct.update（{w：dct [w] 1}）else 1 for w in words} 
>>>> dct 
 {'a'：2，'A'：5，''：2，'d'：2，'F'：1，'C'：1，'S' '：2} 
  
另一种方法是先排序单词列表，然后使用 itertools.groupby ，然后计算每个组的长度。这里的dict-comprehension可以转换为一个生成器，如果你想要的，但是，这将需要先阅读内存中的所有单词：
  from itertools import groupby 
 words.sort（）
 dct = {k：sum（1 for _ in g）for k，g in groupby（words）} 
  
请注意，这个批次的最快的是 collections.defaultdict ：
  d = defaultdict（int）
表示w中的单词：d [w] + = 1 
  
 计时比较： 
 >>>从字符串导入ascii_letters，数字
>>> ％timeit words = list（ascii_letters + digits）* 10 ** 4; words.sort（）; {k：group in（word）中的k的和（在g中为1）} 
 10循环，最好为3：131 ms每循环
>>>> ％timeit words = list（ascii_letters + digits）* 10 ** 4;计数器（单词）
 10循环，最佳3：169 ms每循环
>>>> ％timeit words = list（ascii_letters + digits）* 10 ** 4; dct = {}; {w：1如果w不在dct而不是dct.update（{w：1}）else dct [w] + 1如果不是dct.update（{w：dct [w] + 1}）else 1 for w单词} 
 1循环，最佳3：315 ms每循环
>>>> %% timeit 
 ... words = list（ascii_letters + digits）* 10 ** 4 
 ... d = defaultdict（int）
 ... for w in words：d [ w] + = 1 
 ... 
 10循环，最好3：57.1 ms每循环
>>>> %% timeit 
 words = list（ascii_letters + digits）* 10 ** 4 
d = {} 
表示w：d [w] = d.get（w，0）+ 1 
 ... 
 10循环，最好3：108 ms每循环
 
 #Increase输入大小
 
>>> ％timeit words = list（ascii_letters + digits）* 10 ** 5; words.sort（）;对于k中的k而言，k（k）中的和k（g），在groupby（单词）中的g）g 
 1循环，最好为3：1.44 s每循环
>>> ％timeit words = list（ascii_letters + digits）* 10 ** 5; Counter（words）
 1循环，最好3：1.7 s每循环
>>>> ％timeit words = list（ascii_letters + digits）* 10 ** 5; dct = {}; {w：1如果w不在dct而不是dct.update（{w：1}）else dct [w] + 1如果不是dct.update（{w：dct [w] + 1}）else 1 for w单词} 
 
 1循环，最好3：3.19 s每循环
>>> %% timeit 
 words = list（ascii_letters + digits）* 10 ** 5 
d = defaultdict（int）
 for w in words：d [w] + = 1 
。 .. 
 1循环，最佳3：571 ms每循环
>>>> %% timeit 
 words = list（ascii_letters + digits）* 10 ** 5 
d = {} 
表示w：d [w] = d.get（w，0）+ 1 
 ... 
 1循环，最好3：1.1 s每循环
  
 
I want to write some tests to analyse the efficiency of different operations in python, namely a comparison of dictionary comprehensions and dict generators.

To test this out, I thought I would try a simple example: count the number of words in a list using dictionaries.

Now I know that you can do this using collections.Counter (as per an answer here: How can I count the occurrences of a list item in Python?), but my objective was to test performance an memory.

One "long-hand" way is to do it in a basic loop.
from pprint import pprint

# Read in some text to create example data
with open('text.txt') as f:
    words = f.read().split()

dict1 = {}
for w in words:
    if not dict1.get(w):
        dict1[w] = 1
    else:
        dict1[w] += 1
pprint(dict1)
The result:
{'a': 62,
 'aback': 1,
 'able': 1,
 'abolished': 2,
 'about': 6,
 'accept': 1,
 'accepted': 1,
 'accord': 1,
 'according': 1,
 'across': 1,
 ...
Then I got a bit stuck trying to do the same in a dictionary comprehension:
dict2  = { w: 1 if not dict2.get(w) else dict2.get(w) + 1
            for w in words }
I got an error:
NameError: global name 'dict2' is not defined
I tried defining the dict up front:
dict2 = {}
dict2  = { w: 1 if not dict2.get(w) else dict2.get(w) + 1
            for w in words }
pprint(dict2)
But of course the counts are all set to 1:
{'a': 1,
 'aback': 1,
 'able': 1,
 'abolished': 1,
 'about': 1,
 'accept': 1,
 'accepted': 1,
 'accord': 1,
 'according': 1,
 'across': 1,
 ...
I had a similar problem with dict comprehension:
dict3 = dict( (w, 1 if not dict2.get(w) else dict2.get(w) + 1)
                for w in words)
So my question is: how can I use a dictionary comprehension/generator most efficiently to count the number of occurrences in a list?

Update: @Rawing suggested an alternative approach {word:words.count(word) for word in set(words)} but that would circumvent the mechanism I am trying to test.
 解决方案 
You cannot do this efficiently(at least in terms of memory) using a dict-comprehension, because then you'll have to keep track of current count in another dictionary i.e more memory consumption. Here's how you can do it using a dict-comprehension(not recommended at all :-)):
>>> words = list('asdsadDASDFASCSAASAS')
>>> dct = {}
>>> {w: 1 if w not in dct and not dct.update({w: 1})
                  else dct[w] + 1
                  if not dct.update({w: dct[w] + 1}) else 1 for w in words}
>>> dct
{'a': 2, 'A': 5, 's': 2, 'd': 2, 'F': 1, 'C': 1, 'S': 5, 'D': 2}
Another way will be to sort the words list first then group them using itertools.groupby and then count the length of each group. Here the dict-comprehension can be converted to a generator if you want, but yes this will require reading all words in memory first:
from itertools import groupby
words.sort()
dct = {k: sum(1 for _ in g) for k, g in groupby(words)}
Note that the fastest one of the lot is collections.defaultdict:
d = defaultdict(int)
for w in words: d[w] += 1 
Timing comparisons:
>>> from string import ascii_letters, digits
>>> %timeit words = list(ascii_letters+digits)*10**4; words.sort(); {k: sum(1 for _ in g) for k, g in groupby(words)}
10 loops, best of 3: 131 ms per loop
>>> %timeit words = list(ascii_letters+digits)*10**4; Counter(words)
10 loops, best of 3: 169 ms per loop
>>> %timeit words = list(ascii_letters+digits)*10**4; dct = {}; {w: 1 if w not in dct and not dct.update({w: 1}) else dct[w] + 1 if not dct.update({w: dct[w] + 1}) else 1 for w in words}
1 loops, best of 3: 315 ms per loop
>>> %%timeit
... words = list(ascii_letters+digits)*10**4
... d = defaultdict(int)
... for w in words: d[w] += 1
... 
10 loops, best of 3: 57.1 ms per loop
>>> %%timeit
words = list(ascii_letters+digits)*10**4
d = {}
for w in words: d[w] = d.get(w, 0) + 1
... 
10 loops, best of 3: 108 ms per loop

#Increase input size 

>>> %timeit words = list(ascii_letters+digits)*10**5; words.sort(); {k: sum(1 for _ in g) for k, g in groupby(words)}
1 loops, best of 3: 1.44 s per loop
>>> %timeit words = list(ascii_letters+digits)*10**5; Counter(words)
1 loops, best of 3: 1.7 s per loop
>>> %timeit words = list(ascii_letters+digits)*10**5; dct = {}; {w: 1 if w not in dct and not dct.update({w: 1}) else dct[w] + 1 if not dct.update({w: dct[w] + 1}) else 1 for w in words}

1 loops, best of 3: 3.19 s per loop
>>> %%timeit
words = list(ascii_letters+digits)*10**5
d = defaultdict(int)
for w in words: d[w] += 1
... 
1 loops, best of 3: 571 ms per loop
>>> %%timeit
words = list(ascii_letters+digits)*10**5
d = {}
for w in words: d[w] = d.get(w, 0) + 1
... 
1 loops, best of 3: 1.1 s per loop


                        
这篇关于Python：使用dict comprehension / generator在列表中计数出现次数的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！


                    
                        查看全文

Python：使用dict comprehension / generator在列表中计数出现次数 [英] Python: count occurrences in a list using dict comprehension/generator

问题描述

相关文章

Python最新文章

热门教程

热门工具

登录关闭

Python：使用dict comprehension / generator在列表中计数出现次数 [英] Python: count occurrences in a list using dict comprehension/generator

问题描述

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭