通过python中的xml解析 [英] Parse through an xml in python
本文介绍了通过python中的xml解析的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我正在尝试解析以下xml- http://charts.realclearpolitics.com/Charts/1044.xml .我想在3列的数据框中显示结果:日期,批准,不批准. xml文件是动态的,因为每天都会添加一个新日期,因此代码应说明这一点.我实现了一个静态的解决方案,即我必须循环给出值标签行号.我想学习如何动态地实现它.
I am looking to parse through the following xml-http://charts.realclearpolitics.com/charts/1044.xml. I want to have the result in a data frame with 3 columns: Date, Approve, Disapprove. The xml file is dynamic in the sense that each day a new date is added, so the code should account for that. I have implemented a solution which is static i.e. I have to loop giving the value tag row numbers. I would like to learn how to implement it dynamically.
import numpy as np
import pandas as pd
import requests
from pattern import web
xml = requests.get('http://charts.realclearpolitics.com/charts/1044.xml').text
dom = web.Element(xml)
values = dom.by_tag('value')
date = []
approve = []
disapprove = []
values = dom.by_tag('value')
#The last range number below is 1720 instead of 1727 as last 6 values of Approve & Disapprove tag are blank.
for i in range(0,1720):
date.append(pd.to_datetime(values[i].content))
#The last range number below is 3447 instead of 3454 as last 6 values are blank. Including till 3454 will give error while converting to float.
for i in range(1727,3447):
a = float(values[i].content)
approve.append(a)
#The last range number below is 5174 instead of 5181 as last 6 values are blank.
for i in range(3454,5174):
a = float(values[i].content)
disapprove.append(a)
finalresult = pd.DataFrame({'date': date, 'Approve': approve, 'Disapprove': disapprove})
finalresult
推荐答案
这是使用 lxml 和XPath:
Here is one way to do it with lxml and XPath:
from lxml import etree
import pandas as pd
tree = etree.parse("http://charts.realclearpolitics.com/charts/1044.xml")
date = [s.text for s in tree.xpath("series/value")]
approve = [float(s.text) if s.text else 0.0
for s in tree.xpath("graphs/graph[@title='Approve']/value")]
disapprove = [float(s.text) if s.text else 0.0
for s in tree.xpath("graphs/graph[@title='Disapprove']/value")]
assert len(date) == len(approve) == len(disapprove)
finalresult = pd.DataFrame({'Date': date, 'Approve': approve, 'Disapprove': disapprove})
print finalresult
输出:
<class 'pandas.core.frame.DataFrame'>
Int64Index: 1727 entries, 0 to 1726
Data columns (total 3 columns):
Date 1727 non-null values
Approve 1727 non-null values
Disapprove 1727 non-null values
dtypes: float64(2), object(1)
这篇关于通过python中的xml解析的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文