计算词频并从中制作字典 [英] Counting word frequency and making a dictionary from it
本文介绍了计算词频并从中制作字典的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我想从文本文件中取出每个单词,并计算字典中的词频.
I want to take every word from a text file, and count the word frequency in a dictionary.
示例:'这是文本文件,用于取词和计数'
d = {'this': 1, 'is': 2, 'the': 1, ...}
我还没有那么远,但我就是不知道如何完成它.到目前为止我的代码:
I am not that far, but I just can't see how to complete it. My code so far:
import sys
argv = sys.argv[1]
data = open(argv)
words = data.read()
data.close()
wordfreq = {}
for i in words:
#there should be a counter and somehow it must fill the dict.
推荐答案
如果不想使用collections.Counter,可以自己写函数:
If you don't want to use collections.Counter, you can write your own function:
import sys
filename = sys.argv[1]
fp = open(filename)
data = fp.read()
words = data.split()
fp.close()
unwanted_chars = ".,-_ (and so on)"
wordfreq = {}
for raw_word in words:
word = raw_word.strip(unwanted_chars)
if word not in wordfreq:
wordfreq[word] = 0
wordfreq[word] += 1
为了更好的东西,看看正则表达式.
for finer things, look at regular expressions.
这篇关于计算词频并从中制作字典的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文