Python-从子目录中未找到的目录文件中读取文件 [英] Python - reading files from directory file not found in subdirectory (which is there)

查看:173
本文介绍了Python-从子目录中未找到的目录文件中读取文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我相信这只是一种语法-但是我无法弄清楚为什么我的代码:

I am convinced it is something simply syntactic - I however can not figure out why my code:

import os
from collections import Counter
d = {}
for filename in os.listdir('testfilefolder'):
    f = open(filename,'r')
    d = (f.read()).lower()
    freqs = Counter(d)
    print(freqs)

将不起作用-显然可以看到"testfilefolder"文件夹,并告诉我该文件在该文件夹中,即未找到错误消息"file2.txt".因此它可以找到它来告诉我找不到它...

will not work - it apparently can see in to the 'testfilefolder' folder and tell me that the the file is there i.e. an error message 'file2.txt' is not found. So it can find it to tell me that it is not found...

但是我得到了这段代码:

I however get this piece of code to work:

from collections import Counter
d = {}
f = open("testfilefolder/file2.txt",'r')
d = (f.read()).lower()
freqs = Counter(d)
print(freqs)

奖金-这是做我想做的事情的好方法(从文件中读取并计算单词的出现频率)吗?这是我使用Python的第一天(尽管我有很多编程经验.)

Bonus - is this a good way of doing what I am trying to do (read from file and count the frequencies of words)? This is my first day with Python (although I have some amounts of programming exp.)

我不得不说我喜欢Python!

I have to say that I am liking Python!

谢谢

布莱恩

推荐答案

正如isedev所指出的,listdir()仅返回文件名,而不返回完整路径(或相对路径).解决此问题的另一种方法是先os.chdir()进入相关目录,然后进入os.listdir('.').

As isedev pointed out, listdir() returns just the file names, not the full path (or relative paths). Another way to deal with this problem is to os.chdir() into the directory in question, then os.listdir('.').

第二,看来您的目标是计算单词的频率,而不是字母(字符)的频率.为此,您需要将文件的内容分解为单词.我更喜欢为此使用正则表达式.

Secondly, it seems your goal is to count frequency of words, not letters (characters). For that, you will need to break up the contents of the files into words. I prefer to use regular expression for this.

第三,您的解决方案分别计算每个文件的单词频率.如果需要对所有文件执行此操作,请在开头创建一个Counter()对象,然后调用update()方法来计算计数.

Thirdly, your solution counts words frequencies for each files separately. If you ever need to do it for all files, create a Counter() object in the beginning, then call the update() method to tally the counts.

事不宜迟,我的解决方法是

Without further ado, my solution:

import collections
import re
import os

all_files_frequency = collections.Counter()

previous_dir = os.getcwd()
os.chdir('testfilefolder')
for filename in os.listdir('.'):
    with open(filename) as f:
        file_contents = f.read().lower()

    words = re.findall(r"[a-zA-Z0-9']+", file_contents) # Breaks up into words
    frequency = collections.Counter(words)              # For this file only
    all_files_frequency.update(words)                   # For all files
    print(frequency)

os.chdir(previous_dir)

print ''
print all_files_frequency

这篇关于Python-从子目录中未找到的目录文件中读取文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆