有效地计算字符串中的单词频率 [英] Efficiently calculate word frequency in a string
本文介绍了有效地计算字符串中的单词频率的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我正在解析一长串文本,并计算每个单词在Python中出现的次数.我有一个可以正常工作的函数,但是我正在寻找关于是否有办法使它更高效(就速度而言)以及是否还有python库函数可以为我做到这一点的建议,因此,我不会重新发明轮子?
I am parsing a long string of text and calculating the number of times each word occurs in Python. I have a function that works but I am looking for advice on whether there are ways I can make it more efficient(in terms of speed) and whether there's even python library functions that could do this for me so I'm not reinventing the wheel?
您能建议一种更有效的方法来计算长字符串(字符串中通常超过1000个单词)中出现的最常见单词吗?
还有什么最好的方法将字典排序到列表中,其中第一个元素是最常见的单词,第二个元素是第二个最常见的单词,等等?
test = """abc def-ghi jkl abc
abc"""
def calculate_word_frequency(s):
# Post: return a list of words ordered from the most
# frequent to the least frequent
words = s.split()
freq = {}
for word in words:
if freq.has_key(word):
freq[word] += 1
else:
freq[word] = 1
return sort(freq)
def sort(d):
# Post: sort dictionary d into list of words ordered
# from highest freq to lowest freq
# eg: For {"the": 3, "a": 9, "abc": 2} should be
# sorted into the following list ["a","the","abc"]
#I have never used lambda's so I'm not sure this is correct
return d.sort(cmp = lambda x,y: cmp(d[x],d[y]))
print calculate_word_frequency(test)
推荐答案
使用 collections.Counter
:
>>> from collections import Counter
>>> test = 'abc def abc def zzz zzz'
>>> Counter(test.split()).most_common()
[('abc', 2), ('zzz', 2), ('def', 2)]
这篇关于有效地计算字符串中的单词频率的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文