将许多XML文件解析为一个CSV文件 [英] Parse many XML files to one CSV file

查看:107
本文介绍了将许多XML文件解析为一个CSV文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

下面的代码获取一个XML文件,并将特定元素解析为CSV文件。关于代码,我有一些更简单,不同的代码,但代码略有不同,下面的代码是从这里得到很多帮助的结果。

  from xml.etree从集合导入ElementTree作为ET 
导入defaultdict
导入csv

tree = ET.parse('thexmlfile.xml')
root = tree.getroot()

,其中open('output.csv','w',newline ='')as f:
writer = csv.writer(f)

start_nodes = root.findall('.// START')
标头= ['id','service_code','rational','qualify','description_num','description_txt', 'set_data_xin','set_data_xax','set_data_value','set_data_x']
writer.writerow(headers)
for start_nodes中的sn:
row = defaultdict(str)
$在sn.attrib.items()中k,v的b $ b:
row [k] =在sn.findall('.// Rational')中rn的v

b $ b row ['rational'] = rn.text

for sn.findall('.// Qualify')中的qu:
row ['qualify'] = s.findall('.// Description')中ds的qu.text


row ['description_txt'] = ds.text
row ['description_num'] = ds.attrib ['num']

#除设置数据外,所有其他标签必须在此之前进行解析。 sn.findall('.// SetData')中st的
:st.attrib.items()中k,v的

row ['set_data _'+ str(k) ] = v
row_data = [标题中的i的行[i]]
writer.writerow(row_data)
row = defaultdict(str)



我通常要做的是使用os.listdir(): 。代码看起来像这样

  directory ='C:/ Users / docs / FolderwithXMLs'
作为文件名os.listdir(目录):
如果filename.endswith(。xml):
#这里的东西
df.to_csv( ./ output.csv)
继续
else:
继续

我尝试了不同的方法将其实施到从上面的代码至今没有成功。考虑到此过程也应该很快。

解决方案

尝试:

  
从pathlib导入路径

目录='C:/ Users / docs / FolderwithXMLs'

以open('output.csv','w',newline ='')作为f:
writer = csv.writer(f)

标头= ['id' , service_code, rational, qualify, description_num, description_txt, set_data_xin, set_data_xax, set_data_value, set_data_x]

writer.writerow(headers )

xml_files_list = list(map(str,Path(directory).glob('** / *。xml')))
用于xml_files_list中的xml_file:
tree = ET.parse(xml_file)
root = tree.getroot()

start_nodes = root.findall('.// START')
for start_nodes中的sn:
行= defaultdict(str)

#<<<<在sn.attrib.items()中,k,v的缩进是错误的

row [k] = v

#其余代码在这里。

希望有帮助。


The code below takes an XML file and parses specific elements into a CSV file. Regarding the code I had simpler and different code that had a slightly different out, the code below is as an outcome of a lot help from here.

from xml.etree import ElementTree as ET
from collections import defaultdict
import csv

tree = ET.parse('thexmlfile.xml')
root = tree.getroot()

with open('output.csv', 'w', newline='') as f:
    writer = csv.writer(f)

    start_nodes = root.findall('.//START')
    headers = ['id', 'service_code', 'rational', 'qualify', 'description_num', 'description_txt', 'set_data_xin', 'set_data_xax', 'set_data_value', 'set_data_x']
    writer.writerow(headers)
    for sn in start_nodes:
        row = defaultdict(str)

        for k,v in sn.attrib.items():
            row[k] = v

        for rn in sn.findall('.//Rational'):
            row['rational'] = rn.text

        for qu in sn.findall('.//Qualify'):
            row['qualify'] = qu.text

        for ds in sn.findall('.//Description'):
            row['description_txt'] = ds.text
            row['description_num'] = ds.attrib['num']

        # all other tags except set data must be parsed before this.
        for st in sn.findall('.//SetData'):
            for k,v in st.attrib.items():
                row['set_data_'+ str(k)] = v
            row_data = [row[i] for i in headers]
            writer.writerow(row_data)
            row = defaultdict(str)

I'm trying to make that this code goes to a folder that has many XML files and parses them into one single CSV file. Simply said instead of parsing one XML file , do this for multiple XMLs and parse them to one csv file.

What I would normally do is use os.listdir(): . The code would look something like this

directory = 'C:/Users/docs/FolderwithXMLs'
for filename in os.listdir(directory):
    if filename.endswith(".xml"):
        #Something here
        df.to_csv("./output.csv")
        continue
    else:
        continue

I have tried different ways to implement this into the code from above without success until now. Considering that this process should also be fast.

解决方案

Try:


from pathlib import Path

directory = 'C:/Users/docs/FolderwithXMLs'

with open('output.csv', 'w', newline='') as f:
    writer = csv.writer(f)

    headers = ['id', 'service_code', 'rational', 'qualify', 'description_num', 'description_txt', 'set_data_xin', 'set_data_xax', 'set_data_value', 'set_data_x']

    writer.writerow(headers)

    xml_files_list = list(map(str,Path(directory).glob('**/*.xml')))
    for xml_file in xml_files_list:
        tree = ET.parse(xml_file)
        root = tree.getroot()

        start_nodes = root.findall('.//START')
        for sn in start_nodes:
            row = defaultdict(str)

            # <<<<< Indentation was wrong here
            for k,v in sn.attrib.items():
                row[k] = v

            # Rest of the code here.

Hope that helps.

这篇关于将许多XML文件解析为一个CSV文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆