Python:将XML提取到DataFrame( pandas ) [英] Python: Extracting XML to DataFrame (Pandas)

查看:91
本文介绍了Python:将XML提取到DataFrame( pandas )的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

a有一个看起来像这样的XML文件:

a have an XML file that looks like this:

<?xml version="1.0" encoding="utf-8"?>
<comments>
<row Id="1" PostId="2" Score="0" Text="(...)" CreationDate="2011-08-30T21:15:28.063" UserId="16" />
<row Id="2" PostId="17" Score="1" Text="(...)" CreationDate="2011-08-30T21:24:56.573" UserId="27" />
<row Id="3" PostId="26" Score="0" Text="(...)" UserId="9" />
</comments>

我想做的是将ID,Text和CreationDate列提取到熊猫DF中,我尝试了以下操作:

What I'm trying to do is to extract ID, Text and CreationDate colums into pandas DF and I've tryied following:

import xml.etree.cElementTree as et
import pandas as pd
path = '/.../...'
dfcols = ['ID', 'Text', 'CreationDate']
df_xml = pd.DataFrame(columns=dfcols)

root = et.parse(path)
rows = root.findall('.//row')
for row in rows:
    ID = row.find('Id')
    text = row.find('Text')
    date = row.find('CreationDate')
    print(ID, text, date)
    df_xml = df_xml.append(pd.Series([ID, text, date], index=dfcols), ignore_index=True)

print(df_xml)

但是输出是:无无无

请问如何解决此问题?谢谢

Could you please tell how to fix this? THanks

推荐答案

切勿在for循环内调用DataFrame.append或pd.concat.导致二次复制.

Never call DataFrame.append or pd.concat inside a for-loop. It leads to quadratic copying.

因此,考虑将XML数据解析到一个单独的列表中,然后在任何循环之外的一次调用中将列表传递到 DataFrame 构造函数中.实际上,您可以将具有列表理解的嵌套列表直接传递到构造函数中:

Therefore, consider parsing your XML data into a separate list then pass list into the DataFrame constructor in one call outside of any loop. In fact, you can pass nested lists with list comprehension directly into the constructor:

path = 'AttributesXMLPandas.xml'
dfcols = ['ID', 'Text', 'CreationDate']

root = et.parse(path)
rows = root.findall('.//row')

# NESTED LIST
xml_data = [[row.get('Id'), row.get('Text'), row.get('CreationDate')] 
            for row in rows]

df_xml = pd.DataFrame(xml_data, columns=dfcols)

print(df_xml)

#   ID   Text             CreationDate
# 0  1  (...)  2011-08-30T21:15:28.063
# 1  2  (...)  2011-08-30T21:24:56.573
# 2  3  (...)                     None

这篇关于Python:将XML提取到DataFrame( pandas )的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆