访问子文件夹中的文本文件 [英] Accessing text file within subfolder

查看:114
本文介绍了访问子文件夹中的文本文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

文件结构

我有一个名为 test_folder 的文件夹,其中有几个子文件夹(命名为不同的水果名称,如您所见)我下面的代码)。在每个子文件夹中,总是有一个 metadump.xml 文件,我在其中提取信息。

File Structure
I have a folder, called test_folder, which has several subfolders (named different fruit names, as you'll see in my code below) within it. In each subfolder, there is always a metadump.xml file where I am extracting information from.

当前状态

我可以单独指定子文件夹路径来实现此目标。

Current Stance
I have been able to achieve this on an individual basis, where I specify the subfolder path.

import re

in_file = open("C:/.../Downloads/test_folder/apple/metadump.xml")
contents = in_file.read()
in_file.close()

title = re.search('<dc:title rsfieldtitle="Title" 
rsembeddedequiv="Name" rsfieldref="8" rsfieldtype="0">(.+?)</dc:title>', 
contents).group(1)
print(title)

后续步骤

我想通过简单地引用父文件夹 C来大规模执行以下功能: /.../ Downloads / test_folder ,然后让我的程序为每个子文件夹查找xml文件以提取所需的信息,而不是单独指定每个水果子文件夹。

Next Steps
I would like to perform the following function on a larger scale by simply referencing the parent folder C:/.../Downloads/test_folder and making my program find the xml file for each subfolder to extract the desired information, rather than individually specifying every fruit subfolder.

说明

我想直接访问这些子文件夹以执行此文本提取功能,而不是简单地获取这些子文件夹中的子文件夹列表或xml文件列表

Clarification
Rather than simply obtaining a list of subfolders or a list of xml files within these subfolders, I would like physically access these subfolders to perform this text extraction function from each xml file within each subfolder.

在此先感谢您的帮助。

推荐答案

您可以使用Python的 os.walk()遍历所有子文件夹。如果文件是 metadump.xml ,它将打开它并提取您的标题。显示文件名和标题:

You can use Python's os.walk() to traverse all of the subfolders. If the file is metadump.xml, it will open it and extract your title. The filename and the title is displayed:

import os

for root, dirs, files in os.walk(r"C:\...\Downloads\test_folder"):
    for file in files:
        if file == 'metadump.xml':
            filename = os.path.join(root, file) 

            with open(filename) as f_xml:
                contents = f_xml.read()
                title = re.search('<dc:title rsfieldtitle="Title" rsembeddedequiv="Name" rsfieldref="8" rsfieldtype="0">(.+?)</dc:title>', contents).group(1)
                print('{} : {}'.format(filename, title))

这篇关于访问子文件夹中的文本文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆