在多个文档中计算单词频率python [英] To count the word frequency in multiple documents python

查看：164 发布时间：2020/5/5 14:22:13 python dictionary python-3.x machine-learning

本文介绍了在多个文档中计算单词频率python的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我在字典'd'中有多个文本文件的地址列表:

I have a list of the addresses of multiple text files in a dictionary 'd':

'd:/individual-articles/9.txt', 'd:/individual-articles/11.txt', 'd:/individual-articles/12.txt',...

以此类推...

现在，我需要阅读词典中的每个文件，并保留整个词典中每个单词出现的单词的列表.

Now, I need to read each file in the dictionary and keep a list of the word occurrences of each and every word that occurs in the entire dictionary.

我的输出应采用以下格式:

My output should be of the form:

the-500

a-78

in-56

以此类推.

其中500是单词"the"在字典中的所有文件中出现的次数..依此类推.

where 500 is the number of times the word "the" occurs in all the files in the dictionary..and so on..

我需要对所有单词都这样做.

I need to do this for all the words.

我是python新手..plz帮助！

I am a python newbie..plz help!

我下面的代码不起作用，它没有显示输出！我的逻辑中肯定有一个错误，请纠正！！

My code below doesn't work,it shows no output!There must be a mistake in my logic, please rectify!!

import collections
import itertools
import os
from glob import glob
from collections import Counter




folderpaths='d:/individual-articles'
counter=Counter()


filepaths = glob(os.path.join(folderpaths,'*.txt'))




folderpath='d:/individual-articles/'
# i am creating my dictionary here, can be ignored
d = collections.defaultdict(list)
with open('topics.txt') as f:
    for line in f:
       value, *keys = line.strip().split('~')
        for key in filter(None, keys):
            if key=='earn':
               d[key].append(folderpath+value+".txt")

   for key, value in d.items() :
        print(value)


word_count_dict={}

for file in d.values():
    with open(file,"r") as f:
        words = re.findall(r'\w+', f.read().lower())
        counter = counter + Counter(words)
        for word in words:
            word_count_dict[word].append(counter)              


for word, counts in word_count_dict.values():
    print(word, counts)

推荐答案

来自您使用的Counter集合的启发:

Inspired from the Counter collection that you use:

from glob import glob
from collections import Counter
import re

folderpaths = 'd:/individual-articles'
counter = Counter()

filepaths = glob(os.path.join(folderpaths,'*.txt'))
for file in filepaths:
    with open(file) as f:
        words = re.findall(r'\w+', f.read().lower())
        counter = counter + Counter(words)
print counter

这篇关于在多个文档中计算单词频率python的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

在多个文档中计算单词频率python [英] To count the word frequency in multiple documents python

问题描述

推荐答案

相关文章

AI人工智能最新文章

热门教程

热门工具

登录关闭

在多个文档中计算单词频率python [英] To count the word frequency in multiple documents python

问题描述

推荐答案

相关文章

AI人工智能最新文章

热门教程

热门工具

登录 关闭

登录关闭