从.txt中读取单词,并对每个单词计数 [英] Read words from .txt, and count for each words

查看:109
本文介绍了从.txt中读取单词,并对每个单词计数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想知道如何读取像fscanf这样的字符串.我需要阅读所有.txt文件中的文字. 我需要对每个单词计数.

I wonder, how to read character string like fscanf. I need to read for word, in the all .txt . I need a count for each words.

collectwords = collections.defaultdict(int)

with open('DatoSO.txt', 'r') as filetxt:

for line in filetxt:
    v=""
    for char in line:
        if str(char) != " ":
          v=v+str(char)

        elif str(char) == " ":
          collectwords[v] += 1
          v=""

这样,我看不懂最后的单词.

this way, I cant to read the last word.

推荐答案

如果您使用的是Python> = 2.7

You might also consider using collections.counter if you are using Python >=2.7

http://docs.python.org/library/collections.html #collections.Counter

它添加了诸如"most_common"之类的许多方法,这些方法在此类应用程序中可能会有用.

It adds a number of methods like 'most_common', which might be useful in this type of application.

从道格·赫尔曼(Doug Hellmann)的PyMOTW:

From Doug Hellmann's PyMOTW:

import collections

c = collections.Counter()
with open('/usr/share/dict/words', 'rt') as f:
    for line in f:
        c.update(line.rstrip().lower())

print 'Most common:'
for letter, count in c.most_common(3):
    print '%s: %7d' % (letter, count)

http://www.doughellmann.com/PyMOTW/collections/counter.html -尽管这是字母计数而不是字数统计.在c.update行中,您想将line.rstrip().lower替换为line.split(),也许还需要一些代码来消除标点符号.

http://www.doughellmann.com/PyMOTW/collections/counter.html -- although this does letter counts instead of word counts. In the c.update line, you would want to replace line.rstrip().lower with line.split() and perhaps some code to get rid of punctuation.

编辑:在此处删除标点符号可能是最快的解决方案:

To remove punctuation here is probably the fastest solution:

import collections
import string

c = collections.Counter()
with open('DataSO.txt', 'rt') as f:
    for line in f:
        c.update(line.translate(string.maketrans("",""), string.punctuation).split())

(从以下问题中借出从Python中的字符串中删除标点符号)

这篇关于从.txt中读取单词,并对每个单词计数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆