来自 txt 文件程序的字数统计 [英] Word count from a txt file program
本文介绍了来自 txt 文件程序的字数统计的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我正在使用以下代码计算 txt 文件的字数:
#!/usr/bin/pythonfile=open("D:\\zzzz\\names2.txt","r+")字数={}对于 file.read().split() 中的单词:如果单词不在 wordcount 中:字数[字] = 1别的:字数[字] += 1打印(字,字数)文件.关闭();
这给了我这样的输出:
<预><代码>>>>山羊{'山羊':2,'牛':1,'狗':1,'狮子':1,'蛇':1,'马':1,'':1,'老虎':1、'猫':2、'狗':1}但我希望以下列方式输出:
word 字数统计山羊 2牛 1狗1.....
我还在输出中得到一个额外的符号(
).我怎样才能删除它?
解决方案
您遇到的有趣符号是 UTF-8 BOM(字节顺序标记).要摆脱它们,请使用正确的编码打开文件(我假设您使用的是 Python 3):
file = open(r"D:\zzzz\names2.txt", "r", encoding="utf-8-sig")
此外,对于计数,您可以使用 collections.计数器
:
from collections import Counterwordcount = Counter(file.read().split())
显示它们:
<预><代码>>>>对于 wordcount.items() 中的项目: print("{}\t{}".format(*item))...蛇 1狮子 2山羊 2马3I am counting word of a txt file with the following code:
#!/usr/bin/python
file=open("D:\\zzzz\\names2.txt","r+")
wordcount={}
for word in file.read().split():
if word not in wordcount:
wordcount[word] = 1
else:
wordcount[word] += 1
print (word,wordcount)
file.close();
this is giving me the output like this:
>>>
goat {'goat': 2, 'cow': 1, 'Dog': 1, 'lion': 1, 'snake': 1, 'horse': 1, '': 1, 'tiger': 1, 'cat': 2, 'dog': 1}
but I want the output in the following manner:
word wordcount
goat 2
cow 1
dog 1.....
Also I am getting an extra symbol in the output (
). How can I remove this?
解决方案
The funny symbols you're encountering are a UTF-8 BOM (Byte Order Mark). To get rid of them, open the file using the correct encoding (I'm assuming you're on Python 3):
file = open(r"D:\zzzz\names2.txt", "r", encoding="utf-8-sig")
Furthermore, for counting, you can use collections.Counter
:
from collections import Counter
wordcount = Counter(file.read().split())
Display them with:
>>> for item in wordcount.items(): print("{}\t{}".format(*item))
...
snake 1
lion 2
goat 2
horse 3
这篇关于来自 txt 文件程序的字数统计的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文