打开目录中的每个文件/子文件夹并将结果打印到 .txt 文件 [英] Open every file/subfolder in directory and print results to .txt file

查看：65 发布时间：2021/6/23 19:53:20 python-3.x text-files pycharm subdirectory

本文介绍了打开目录中的每个文件/子文件夹并将结果打印到 .txt 文件的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

目前我正在使用此代码:

At the moment I am working with this code:

from bs4 import BeautifulSoup
import glob
import os
import re
import contextlib


@contextlib.contextmanager
def stdout2file(fname):
    import sys
    f = open(fname, 'w')
    sys.stdout = f
    yield
    sys.stdout = sys.__stdout__
    f.close()

def trade_spider():
    os.chdir(r"C:\Users\6930p\FLO'S DATEIEN\Master FAU\Sommersemester 2016\02_Masterarbeit\04_Testumgebung\01_Probedateien für Analyseaspekt\Independent Auditors Report")
    with stdout2file("output.txt"):
        for file in glob.iglob('**/*.html', recursive=True):
            with open(file, encoding="utf8") as f:
                contents = f.read()
                soup = BeautifulSoup(contents, "html.parser")
                for item in soup.findAll("ix:nonfraction"):
                    if re.match(".*AuditFeesExpenses", item['name']):
                        print(file.split(os.path.sep)[-1], end="| ")
                        print(item['name'], end="| ")
                        print(item.get_text())
trade_spider()

到目前为止，这很完美.但现在我遇到了另一个问题.如果我在一个没有子文件夹但只有文件的文件夹中搜索，这可以正常工作.但是，如果我尝试在具有子文件夹的文件夹上运行此代码，则它不起作用(它不打印任何内容！).此外，我想让我的结果打印到一个 .txt 文件中，而不包含整个路径.结果应该是这样的:

So far this works perfectly. But now I am stucked with another issue. If I search within a folder which has no subfolders but only files this works without problems. However if i try to run this code on a folder that has subfolders it doesn't work (it prints nothing!). Furthermore I would like to get my results print into a .txt file without having the whole path in it. The result should be like:

Filename.html| RegEX Match| HTML text

我已经得到了这个结果，但只在 PyCharm 中而不是在单独的 .txt 文件中.

I do get this result already, but only in PyCharm and not in a seperate .txt file.

总而言之，我有两个问题:

To sum up, I do have 2 questions:

我怎样才能浏览我定义的目录中的子文件夹?-> os.walk() 会是一个选项吗?
如何将结果打印到 .txt 文件中?-> sys.stdout 可以解决这个问题吗?

对这个问题的任何帮助表示赞赏！

Any help appreciated on this issue!

更新:它只将第一个文件的第一个结果打印到我的outout.txt"文件中(至少我认为它是第一个，因为它是我唯一子文件夹中的最后一个文件并且 recursive=true 被激活).知道为什么它不遍历所有其他文件吗?

UPDATE: It only prints the first results of the first file into my "outout.txt" file (at least I think it is the first as it is the last file in my only subfolder and recursive=true is activated). Any idea why it is not looping through all the other files?

UPDATE_2:问题已解决！最终代码可以在上面看到！

UPDATE_2: Question resolved! Final Code can be seen above!

推荐答案

对于子目录的遍历，有两种选择:

For walking in subdirectories, there are two options:

使用 ** 与 glob 和参数 recursive=True (glob.glob('**/*.html')).这仅适用于 Python 3.5+.如果目录树很大，我还建议使用 glob.iglob 而不是 glob.glob.



Use ** with glob and the argument recursive=True (glob.glob('**/*.html')). This only works in Python 3.5+. I would also recommend using glob.iglob instead of glob.glob if the directory tree is large.
使用 os.walk 并手动或使用 fnmatch.filter 检查文件名(是否以 ".html" 结尾)>.
Use os.walk and check the filenames (whether they end in ".html") manually or with fnmatch.filter.
<小时>
关于打印成文件，还有几种方式:




Regarding the printing into a file, there are again several ways:
只需执行脚本并重定向标准输出，即 python3 myscript.py >myfile.txt
将print 的调用替换为写入模式下文件对象的.write() 方法`.
Replace calls to print with a call to the .write() method of a file object in write mode`.
继续使用打印，但给它参数 file=myfile 其中 myfile 再次是一个可写的文件对象.
Keep using print, but give it the argument file=myfile where myfile is again a writable file object.
也许最不引人注目的方法如下.首先，将其包含在某处:
edit: Maybe the most unobstrusive method would be the following. First, include this somewhere:
import contextlib
@contextlib.contextmanager
def stdout2file(fname):
    import sys
    f = open(fname, 'w')
    sys.stdout = f
    yield
    sys.stdout = sys.__stdout__
    f.close()

然后，在循环文件的那一行之前，添加这一行(并适当缩进):
And then, infront of the line in which you loop over the files, add this line (and appropriately indent):
with stdout2file("output.txt"):


                        这篇关于打开目录中的每个文件/子文件夹并将结果打印到 .txt 文件的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！


                    
                        查看全文

打开目录中的每个文件/子文件夹并将结果打印到 .txt 文件 [英] Open every file/subfolder in directory and print results to .txt file

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

打开目录中的每个文件/子文件夹并将结果打印到 .txt 文件 [英] Open every file/subfolder in directory and print results to .txt file

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭