txt文件中的Python计数器 [英] Python Counter from txt file
问题描述
我想从一个单词频率计数的文本文件中初始化一个collections.Counter对象。也就是说,我有一个文件 counts.txt:
I would like to init a collections.Counter object from a text file of word frequency counts. That is, I have a file "counts.txt":
rank wordform abs r mod
1 the 225300 29 223066.9
2 and 157486 29 156214.4
3 to 134478 29 134044.8
...
999 fallen 345 29 326.6
1000 supper 368 27 325.8
我想要一个Counter对象 wordCounts
以便可以调用
I would like a Counter object wordCounts
such that I can call
>>> print wordCounts.most_common(3)
[('the', 225300), ('of', 157486), ('and', 134478)]
最有效的Python方式
What is the most efficient, Pythonic way
推荐答案
两个版本。第一个将您的 counts.txt
作为常规文本文件。第二个将其视为一个csv文件(看起来像这样)。
Here are two versions. The first takes your counts.txt
as a regular text file. The second treats it as a csv file (which is what it kind of looks like).
from collections import Counter
with open('counts.txt') as f:
lines = [line.strip().split() for line in f]
wordCounts = Counter({line[1]: int(line[2]) for line in lines[1:]})
print wordCounts.most_common(3)
如果您的数据文件被证明是由某些一致的字符或字符串分隔,您可以使用 csv.DictReader
对象来解析文件。
If your data file some how turned out to be delimited by some consistent character or string you could use a csv.DictReader
object to parse the file.
如下所示如果文件是用 TAB
分隔的,那么该怎么做。
Shown below is how it could be done IF your file were TAB
delimited.
数据文件(由我编辑为制表符分隔的文件)
The data file (as edited by me to be TAB delimited)
rank wordform abs r mod
1 the 225300 29 223066.9
2 and 157486 29 156214.4
3 to 134478 29 134044.8
999 fallen 345 29 326.6
1000 supper 368 27 325.8
代码
from csv import DictReader
from collections import Counter
with open('counts.txt') as f:
reader = DictReader(f, delimiter='\t')
wordCounts = Counter({row['wordform']: int(row['abs']) for row in reader})
print wordCounts.most_common(3)
这篇关于txt文件中的Python计数器的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!