将许多XML文件解析为一个CSV文件 [英] Parse many XML files to one CSV file
问题描述
下面的代码获取一个XML文件,并将特定元素解析为CSV文件。关于代码,我有一些更简单,不同的代码,但代码略有不同,下面的代码是从这里得到很多帮助的结果。
from xml.etree从集合导入ElementTree作为ET
导入defaultdict
导入csv
tree = ET.parse('thexmlfile.xml')
root = tree.getroot()
,其中open('output.csv','w',newline ='')as f:
writer = csv.writer(f)
start_nodes = root.findall('.// START')
标头= ['id','service_code','rational','qualify','description_num','description_txt', 'set_data_xin','set_data_xax','set_data_value','set_data_x']
writer.writerow(headers)
for start_nodes中的sn:
row = defaultdict(str)
$在sn.attrib.items()中k,v的b $ b:
row [k] =在sn.findall('.// Rational')中rn的v
b $ b row ['rational'] = rn.text
for sn.findall('.// Qualify')中的qu:
row ['qualify'] = s.findall('.// Description')中ds的qu.text
:
row ['description_txt'] = ds.text
row ['description_num'] = ds.attrib ['num']
#除设置数据外,所有其他标签必须在此之前进行解析。 sn.findall('.// SetData')中st的
:st.attrib.items()中k,v的
:
row ['set_data _'+ str(k) ] = v
row_data = [标题中的i的行[i]]
writer.writerow(row_data)
row = defaultdict(str)
$我试图使这段代码进入一个包含许多XML文件的文件夹,并将它们解析为一个CSV文件。简单地说,而不是解析一个XML文件,而是对多个XML执行此操作并将它们解析为一个csv文件。
我通常要做的是使用os.listdir(): 。代码看起来像这样
directory ='C:/ Users / docs / FolderwithXMLs'
作为文件名os.listdir(目录):
如果filename.endswith(。xml):
#这里的东西
df.to_csv( ./ output.csv)
继续
else:
继续
我尝试了不同的方法将其实施到从上面的代码至今没有成功。考虑到此过程也应该很快。
解决方案尝试:
从pathlib导入路径
目录='C:/ Users / docs / FolderwithXMLs'
以open('output.csv','w',newline ='')作为f:
writer = csv.writer(f)
标头= ['id' , service_code, rational, qualify, description_num, description_txt, set_data_xin, set_data_xax, set_data_value, set_data_x]
writer.writerow(headers )
xml_files_list = list(map(str,Path(directory).glob('** / *。xml')))
用于xml_files_list中的xml_file:
tree = ET.parse(xml_file)
root = tree.getroot()
start_nodes = root.findall('.// START')
for start_nodes中的sn:
行= defaultdict(str)
#<<<<在sn.attrib.items()中,k,v的缩进是错误的
:
row [k] = v
#其余代码在这里。
希望有帮助。
The code below takes an XML file and parses specific elements into a CSV file. Regarding the code I had simpler and different code that had a slightly different out, the code below is as an outcome of a lot help from here.
from xml.etree import ElementTree as ET
from collections import defaultdict
import csv
tree = ET.parse('thexmlfile.xml')
root = tree.getroot()
with open('output.csv', 'w', newline='') as f:
writer = csv.writer(f)
start_nodes = root.findall('.//START')
headers = ['id', 'service_code', 'rational', 'qualify', 'description_num', 'description_txt', 'set_data_xin', 'set_data_xax', 'set_data_value', 'set_data_x']
writer.writerow(headers)
for sn in start_nodes:
row = defaultdict(str)
for k,v in sn.attrib.items():
row[k] = v
for rn in sn.findall('.//Rational'):
row['rational'] = rn.text
for qu in sn.findall('.//Qualify'):
row['qualify'] = qu.text
for ds in sn.findall('.//Description'):
row['description_txt'] = ds.text
row['description_num'] = ds.attrib['num']
# all other tags except set data must be parsed before this.
for st in sn.findall('.//SetData'):
for k,v in st.attrib.items():
row['set_data_'+ str(k)] = v
row_data = [row[i] for i in headers]
writer.writerow(row_data)
row = defaultdict(str)
I'm trying to make that this code goes to a folder that has many XML files and parses them into one single CSV file. Simply said instead of parsing one XML file , do this for multiple XMLs and parse them to one csv file.
What I would normally do is use os.listdir(): . The code would look something like this
directory = 'C:/Users/docs/FolderwithXMLs'
for filename in os.listdir(directory):
if filename.endswith(".xml"):
#Something here
df.to_csv("./output.csv")
continue
else:
continue
I have tried different ways to implement this into the code from above without success until now. Considering that this process should also be fast.
解决方案 Try:
from pathlib import Path
directory = 'C:/Users/docs/FolderwithXMLs'
with open('output.csv', 'w', newline='') as f:
writer = csv.writer(f)
headers = ['id', 'service_code', 'rational', 'qualify', 'description_num', 'description_txt', 'set_data_xin', 'set_data_xax', 'set_data_value', 'set_data_x']
writer.writerow(headers)
xml_files_list = list(map(str,Path(directory).glob('**/*.xml')))
for xml_file in xml_files_list:
tree = ET.parse(xml_file)
root = tree.getroot()
start_nodes = root.findall('.//START')
for sn in start_nodes:
row = defaultdict(str)
# <<<<< Indentation was wrong here
for k,v in sn.attrib.items():
row[k] = v
# Rest of the code here.
Hope that helps.
这篇关于将许多XML文件解析为一个CSV文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!