访问子文件夹中的文本文件 [英] Accessing text file within subfolder
问题描述
文件结构
我有一个名为 test_folder 的文件夹,其中有几个子文件夹(命名为不同的水果名称,如您所见)我下面的代码)。在每个子文件夹中,总是有一个 metadump.xml 文件,我在其中提取信息。
File Structure
I have a folder, called test_folder, which has several subfolders (named different fruit names, as you'll see in my code below) within it. In each subfolder, there is always a metadump.xml file where I am extracting information from.
当前状态
我可以单独指定子文件夹路径来实现此目标。
Current Stance
I have been able to achieve this on an individual basis, where I specify the subfolder path.
import re
in_file = open("C:/.../Downloads/test_folder/apple/metadump.xml")
contents = in_file.read()
in_file.close()
title = re.search('<dc:title rsfieldtitle="Title"
rsembeddedequiv="Name" rsfieldref="8" rsfieldtype="0">(.+?)</dc:title>',
contents).group(1)
print(title)
后续步骤
我想通过简单地引用父文件夹 C来大规模执行以下功能: /.../ Downloads / test_folder ,然后让我的程序为每个子文件夹查找xml文件以提取所需的信息,而不是单独指定每个水果子文件夹。
Next Steps
I would like to perform the following function on a larger scale by simply referencing the parent folder C:/.../Downloads/test_folder and making my program find the xml file for each subfolder to extract the desired information, rather than individually specifying every fruit subfolder.
说明
我想直接访问这些子文件夹以执行此文本提取功能,而不是简单地获取这些子文件夹中的子文件夹列表或xml文件列表
Clarification
Rather than simply obtaining a list of subfolders or a list of xml files within these subfolders, I would like physically access these subfolders to perform this text extraction function from each xml file within each subfolder.
在此先感谢您的帮助。
推荐答案
您可以使用Python的 os.walk()
遍历所有子文件夹。如果文件是 metadump.xml
,它将打开它并提取您的标题。显示文件名和标题:
You can use Python's os.walk()
to traverse all of the subfolders. If the file is metadump.xml
, it will open it and extract your title. The filename and the title is displayed:
import os
for root, dirs, files in os.walk(r"C:\...\Downloads\test_folder"):
for file in files:
if file == 'metadump.xml':
filename = os.path.join(root, file)
with open(filename) as f_xml:
contents = f_xml.read()
title = re.search('<dc:title rsfieldtitle="Title" rsembeddedequiv="Name" rsfieldref="8" rsfieldtype="0">(.+?)</dc:title>', contents).group(1)
print('{} : {}'.format(filename, title))
这篇关于访问子文件夹中的文本文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!