Python-使用Python pandas 将xml转换为csv [英] Python - Converting xml to csv using Python pandas
问题描述
我是这里的新手,我一直在尝试创建一个小的python脚本以将xml转换为csv.根据我在Stackoverflow上阅读的各种文章,我设法提出了一个可以正常工作的示例代码.但是,我尝试使用的数据具有多个层次,因此我不确定如何在叶中提取数据级别.
I am new in here and I have been trying to create a small python script to convert xml to csv. Based on my reading various post here in Stackoverflow I have managed to come up with a sample code that works just fine.. However the data I am trying to work with has multiple layers and thus I am unsure how to extract the data at the leaf level.
下面是数据的外观:
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<Transmission>
<TransmissionBody>
<level1>
<level2>
<level3>
<level4>
<level5>
<level6>
<ColA>ABC</ColA>
<ColB>123</ColB>
</level6>
</level5>
</level4>
</level3>
</level2>
</level1>
</TransmissionBody>
</Transmission>
我正在尝试使用以下代码尝试将xml转换为csv
I am trying to use the below code to try converting the xml to csv
import pandas as pd
import xml.etree.ElementTree as ET
tree = ET.parse('file.xml')
root = tree.getroot()
final = {}
for elem in root:
if len(elem):
for c in elem.getchildren():
final[c.tag] = c.text
else:
final[elem.tag] = elem.text
df = pd.DataFrame([final])
df.to_csv('file.csv)
但是,此代码只是从级别6中提取级别2而不是ColA.
This code however just pulls level2 and not ColA from level6.
预期产量:
Transmission,TransmissionBody,level1,level2,level3,level4,level5,level6,ColA,ColB
,,,,,,,,ABC,123
,,,,,,,,DEF,456
更新的代码:
allFiles = glob.glob(folder)
for file in allFiles:
xmllist = [file]
for xmlfile in xmllist:
tree = ET.parse(xmlfile)
root = tree.getroot()
def f(elem, result):
result[elem.tag] = elem.text
cs = elem.getchildren()
for c in cs:
result = f(c, result)
return result
d = f(root, {})
df = pd.DataFrame(d, index=['values'])
推荐答案
如果我正确理解了您的问题,则需要遍历XML树,因此您可能希望有一个执行此操作的递归函数.类似于以下内容:
If I understood your question correctly, you need to traverse the XML tree, so you probably want to have a recursive function that does that. Something like the following:
import pandas as pd
import xml.etree.ElementTree as ET
tree = ET.parse('file.xml')
root = tree.getroot()
def f(elem, result):
result[elem.tag] = elem.text
cs = elem.getchildren()
for c in cs:
result = f(c, result)
return result
d = f(root, {})
df = pd.DataFrame(d, index=['values']).T
df
出局:
values
Transmission \n
TransmissionBody \n
level1 \n
level2 \n
level3 \n
level4 \n
level5 \n
level6 \n
ColA ABC
ColB 123
更新: 这是我们需要在多个XML文件上进行处理的时候.我添加了与原始文件类似的另一个文件,其中ColA,而ColB行替换为
Update: Here's when we need to do it on multiple XML files. I've added another file similar to the original one with ColA, ColB rows replaced with
<ColA>DEF</ColA>
<ColB>456</ColD>
代码如下:
def f(elem, result):
result[elem.tag] = elem.text
cs = elem.getchildren()
for c in cs:
result = f(c, result)
return result
result = {}
for file in glob.glob('*.xml'):
tree = ET.parse(file)
root = tree.getroot()
result = f(root, result)
df = pd.DataFrame(result, index=['values']).T
df
输出:
0 1
Transmission \n \n
TransmissionBody \n \n
level1 \n \n
level2 \n \n
level3 \n \n
level4 \n \n
level5 \n \n
level6 \n \n
ColA ABC DEF
ColB 123 456
这篇关于Python-使用Python pandas 将xml转换为csv的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!