打开目录中的每个文件/子文件夹并将结果打印到 .txt 文件 [英] Open every file/subfolder in directory and print results to .txt file

查看:65
本文介绍了打开目录中的每个文件/子文件夹并将结果打印到 .txt 文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

目前我正在使用此代码:

At the moment I am working with this code:

from bs4 import BeautifulSoup
import glob
import os
import re
import contextlib


@contextlib.contextmanager
def stdout2file(fname):
    import sys
    f = open(fname, 'w')
    sys.stdout = f
    yield
    sys.stdout = sys.__stdout__
    f.close()

def trade_spider():
    os.chdir(r"C:\Users\6930p\FLO'S DATEIEN\Master FAU\Sommersemester 2016\02_Masterarbeit\04_Testumgebung\01_Probedateien für Analyseaspekt\Independent Auditors Report")
    with stdout2file("output.txt"):
        for file in glob.iglob('**/*.html', recursive=True):
            with open(file, encoding="utf8") as f:
                contents = f.read()
                soup = BeautifulSoup(contents, "html.parser")
                for item in soup.findAll("ix:nonfraction"):
                    if re.match(".*AuditFeesExpenses", item['name']):
                        print(file.split(os.path.sep)[-1], end="| ")
                        print(item['name'], end="| ")
                        print(item.get_text())
trade_spider()

到目前为止,这很完美.但现在我遇到了另一个问题.如果我在一个没有子文件夹但只有文件的文件夹中搜索,这可以正常工作.但是,如果我尝试在具有子文件夹的文件夹上运行此代码,则它不起作用(它不打印任何内容!).此外,我想让我的结果打印到一个 .txt 文件中,而不包含整个路径.结果应该是这样的:

So far this works perfectly. But now I am stucked with another issue. If I search within a folder which has no subfolders but only files this works without problems. However if i try to run this code on a folder that has subfolders it doesn't work (it prints nothing!). Furthermore I would like to get my results print into a .txt file without having the whole path in it. The result should be like:

Filename.html| RegEX Match| HTML text

我已经得到了这个结果,但只在 PyCharm 中而不是在单独的 .txt 文件中.

I do get this result already, but only in PyCharm and not in a seperate .txt file.

总而言之,我有两个问题:

To sum up, I do have 2 questions:

  1. 我怎样才能浏览我定义的目录中的子文件夹?-> os.walk() 会是一个选项吗?
  2. 如何将结果打印到 .txt 文件中?-> sys.stdout 可以解决这个问题吗?

对这个问题的任何帮助表示赞赏!

Any help appreciated on this issue!

更新:它只将第一个文件的第一个结果打印到我的outout.txt"文件中(至少我认为它是第一个,因为它是我唯一子文件夹中的最后一个文件并且 recursive=true 被激活).知道为什么它不遍历所有其他文件吗?

UPDATE: It only prints the first results of the first file into my "outout.txt" file (at least I think it is the first as it is the last file in my only subfolder and recursive=true is activated). Any idea why it is not looping through all the other files?

UPDATE_2:问题已解决!最终代码可以在上面看到!

UPDATE_2: Question resolved! Final Code can be seen above!

推荐答案

对于子目录的遍历,有两种选择:

For walking in subdirectories, there are two options:

  1. 使用 ** 与 glob 和参数 recursive=True (glob.glob('**/*.html')).这仅适用于 Python 3.5+.如果目录树很大,我还建议使用 glob.iglob 而不是 glob.glob.

  1. Use ** with glob and the argument recursive=True (glob.glob('**/*.html')). This only works in Python 3.5+. I would also recommend using glob.iglob instead of glob.glob if the directory tree is large.

使用 os.walk 并手动或使用 fnmatch.filter 检查文件名(是否以 ".html" 结尾)>.

Use os.walk and check the filenames (whether they end in ".html") manually or with fnmatch.filter.

<小时>

关于打印成文件,还有几种方式:


Regarding the printing into a file, there are again several ways:

  1. 只需执行脚本并重定向标准输出,即 python3 myscript.py >myfile.txt

print 的调用替换为写入模式下文件对象的.write() 方法`.

Replace calls to print with a call to the .write() method of a file object in write mode`.

继续使用打印,但给它参数 file=myfile 其中 myfile 再次是一个可写的文件对象.

Keep using print, but give it the argument file=myfile where myfile is again a writable file object.

也许最不引人注目的方法如下.首先,将其包含在某处:

edit: Maybe the most unobstrusive method would be the following. First, include this somewhere:

import contextlib
@contextlib.contextmanager
def stdout2file(fname):
    import sys
    f = open(fname, 'w')
    sys.stdout = f
    yield
    sys.stdout = sys.__stdout__
    f.close()

然后,在循环文件的那一行之前,添加这一行(并适当缩进):

And then, infront of the line in which you loop over the files, add this line (and appropriately indent):

with stdout2file("output.txt"):

这篇关于打开目录中的每个文件/子文件夹并将结果打印到 .txt 文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆