从.txt中读取单词,并对每个单词计数 [英] Read words from .txt, and count for each words
问题描述
我想知道如何读取像fscanf这样的字符串.我需要阅读所有.txt文件中的文字. 我需要对每个单词计数.
I wonder, how to read character string like fscanf. I need to read for word, in the all .txt . I need a count for each words.
collectwords = collections.defaultdict(int)
with open('DatoSO.txt', 'r') as filetxt:
for line in filetxt:
v=""
for char in line:
if str(char) != " ":
v=v+str(char)
elif str(char) == " ":
collectwords[v] += 1
v=""
这样,我看不懂最后的单词.
this way, I cant to read the last word.
推荐答案
如果您使用的是Python> = 2.7
You might also consider using collections.counter
if you are using Python >=2.7
http://docs.python.org/library/collections.html #collections.Counter
它添加了诸如"most_common"之类的许多方法,这些方法在此类应用程序中可能会有用.
It adds a number of methods like 'most_common', which might be useful in this type of application.
从道格·赫尔曼(Doug Hellmann)的PyMOTW:
From Doug Hellmann's PyMOTW:
import collections
c = collections.Counter()
with open('/usr/share/dict/words', 'rt') as f:
for line in f:
c.update(line.rstrip().lower())
print 'Most common:'
for letter, count in c.most_common(3):
print '%s: %7d' % (letter, count)
http://www.doughellmann.com/PyMOTW/collections/counter.html -尽管这是字母计数而不是字数统计.在c.update
行中,您想将line.rstrip().lower
替换为line.split()
,也许还需要一些代码来消除标点符号.
http://www.doughellmann.com/PyMOTW/collections/counter.html -- although this does letter counts instead of word counts. In the c.update
line, you would want to replace line.rstrip().lower
with line.split()
and perhaps some code to get rid of punctuation.
编辑:在此处删除标点符号可能是最快的解决方案:
To remove punctuation here is probably the fastest solution:
import collections
import string
c = collections.Counter()
with open('DataSO.txt', 'rt') as f:
for line in f:
c.update(line.translate(string.maketrans("",""), string.punctuation).split())
(从以下问题中借出从Python中的字符串中删除标点符号)
这篇关于从.txt中读取单词,并对每个单词计数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!