Python-使用Python pandas 将xml转换为csv [英] Python - Converting xml to csv using Python pandas

查看:118
本文介绍了Python-使用Python pandas 将xml转换为csv的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我是这里的新手,我一直在尝试创建一个小的python脚本以将xml转换为csv.根据我在Stackoverflow上阅读的各种文章,我设法提出了一个可以正常工作的示例代码.但是,我尝试使用的数据具有多个层次,因此我不确定如何在叶中提取数据级别.

I am new in here and I have been trying to create a small python script to convert xml to csv. Based on my reading various post here in Stackoverflow I have managed to come up with a sample code that works just fine.. However the data I am trying to work with has multiple layers and thus I am unsure how to extract the data at the leaf level.

下面是数据的外观:

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<Transmission>
    <TransmissionBody>
        <level1>
            <level2>
                <level3>
                    <level4>
                        <level5>
                            <level6>
                                <ColA>ABC</ColA>
                                <ColB>123</ColB>
                            </level6>
                        </level5>
                    </level4>
                </level3>
            </level2>
        </level1>
    </TransmissionBody>
</Transmission>

我正在尝试使用以下代码尝试将xml转换为csv

I am trying to use the below code to try converting the xml to csv

import pandas as pd
import xml.etree.ElementTree as ET

tree = ET.parse('file.xml')
root = tree.getroot()
final = {}
for elem in root:
    if len(elem):
        for c in elem.getchildren():
            final[c.tag] = c.text
    else:
        final[elem.tag] = elem.text

df = pd.DataFrame([final])
df.to_csv('file.csv)

但是,此代码只是从级别6中提取级别2而不是ColA.

This code however just pulls level2 and not ColA from level6.

预期产量:

Transmission,TransmissionBody,level1,level2,level3,level4,level5,level6,ColA,ColB
,,,,,,,,ABC,123
,,,,,,,,DEF,456

更新的代码:

allFiles = glob.glob(folder)
for file in allFiles:
    xmllist = [file]
    for xmlfile in xmllist:
        tree = ET.parse(xmlfile)
        root = tree.getroot()

        def f(elem, result):
            result[elem.tag] = elem.text
            cs = elem.getchildren()
            for c in cs:
                result = f(c, result)
            return result

         d = f(root, {})
         df = pd.DataFrame(d, index=['values'])

推荐答案

如果我正确理解了您的问题,则需要遍历XML树,因此您可能希望有一个执行此操作的递归函数.类似于以下内容:

If I understood your question correctly, you need to traverse the XML tree, so you probably want to have a recursive function that does that. Something like the following:

import pandas as pd
import xml.etree.ElementTree as ET

tree = ET.parse('file.xml')
root = tree.getroot()

def f(elem, result):
    result[elem.tag] = elem.text
    cs = elem.getchildren()
    for c in cs:
        result = f(c, result)
    return result

d = f(root, {})
df = pd.DataFrame(d, index=['values']).T
df

出局:

    values
Transmission    \n
TransmissionBody    \n
level1  \n
level2  \n
level3  \n
level4  \n
level5  \n
level6  \n
ColA    ABC
ColB    123

更新: 这是我们需要在多个XML文件上进行处理的时候.我添加了与原始文件类似的另一个文件,其中ColA,而ColB行替换为

Update: Here's when we need to do it on multiple XML files. I've added another file similar to the original one with ColA, ColB rows replaced with

<ColA>DEF</ColA>
<ColB>456</ColD>

代码如下:

def f(elem, result):
    result[elem.tag] = elem.text
    cs = elem.getchildren()
    for c in cs:
        result = f(c, result)
    return result

result = {}
for file in glob.glob('*.xml'):
    tree = ET.parse(file)
    root = tree.getroot()
    result = f(root, result)

df = pd.DataFrame(result, index=['values']).T
df

输出:

                    0    1
Transmission       \n   \n
TransmissionBody   \n   \n
level1             \n   \n
level2             \n   \n
level3             \n   \n
level4             \n   \n
level5             \n   \n
level6             \n   \n
ColA              ABC  DEF
ColB              123  456

这篇关于Python-使用Python pandas 将xml转换为csv的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆