在目录中创建多个文本文件的字数字典 [英] Creating a dictionary of word count of multiple text files in a directory

查看:94
本文介绍了在目录中创建多个文本文件的字数字典的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在word_count_directory()函数中使用build_dict()函数来创建目录中三个文件的单词计数字典。我想创建三个词典(每个文件一次),并更新以前的词典。我的代码创建了一个字典(word_count),该字典同时结合了所有三个字典。我想知道如何做到这一点?

I am using build_dict() function in word_count_directory() function to create a dictionary of word counts of three files in a directory. I want to create three dictionaries (one at a time for each file) and update previous dictionary. My code creates a single dictionary (word_count) that combining all three dictionaries at same time. I was wondering how to accomplish this?

def build_dict(filename):
   f = open(filename, 'rU')
   words = f.read().split()
   count = {}

   for word in words:
      word = word.lower()
      if word not in count:
        count[word] = 1
      else:
        count[word] += 1

   f.close()

   return count
## print build_dict("C:\\Users\\Phil2040\\Desktop\\word_count\\news1.txt")

import os
import os.path
def word_count_directory(directory):
    wordcount={}
    filelist=[os.path.join(directory,f) for f in os.listdir(directory)]
    for file in filelist:
       wordcount=build_dict(file)  # calling build_dict function
    return wordcount
print word_count_directory("C:\\Users\\Phil2040\\Desktop\\Word_count")


推荐答案

使用 collections.Counter

示例文件:

/tmp/foo.txt

hello world
hello world
foo bar
foo bar baz

/tmp/bar.txt

hello world
hello world
foo bar
foo bar baz
foo foo foo

您可以为每个文件创建一个计数器,然后将它们加在一起!

You can create one Counter per file, then add them together!

from collections import Counter

def word_count(filename):
    with open(filename, 'r') as f:
        c = Counter()
        for line in f:
            c.update(line.strip().split(' '))
        return c

files = ['/tmp/foo.txt', '/tmp/bar.txt']
counters = [word_count(filename) for filename in files]

# counters content (example):
# [Counter({'world': 2, 'foo': 2, 'bar': 2, 'hello': 2, 'baz': 1}),
#  Counter({'foo': 5, 'world': 2, 'bar': 2, 'hello': 2, 'baz': 1})]

# Add all the word counts together:
total = sum(counters, Counter())  # sum needs an empty counter to start with

# total content (example):
# Counter({'foo': 7, 'world': 4, 'bar': 4, 'hello': 4, 'baz': 2})

这篇关于在目录中创建多个文本文件的字数字典的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆