使用ElementTree示例在Python中解析XML [英] Parsing XML in Python using ElementTree example

查看:92
本文介绍了使用ElementTree示例在Python中解析XML的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我很难找到一个很好的基本示例,说明如何使用元素树在python中解析XML。据我所知,这似乎是用于解析XML的最简单的库。以下是我正在使用的XML的示例:

I'm having a hard time finding a good, basic example of how to parse XML in python using Element Tree. From what I can find, this appears to be the easiest library to use for parsing XML. Here is a sample of the XML I'm working with:

<timeSeriesResponse>
    <queryInfo>
        <locationParam>01474500</locationParam>
        <variableParam>99988</variableParam>
        <timeParam>
            <beginDateTime>2009-09-24T15:15:55.271</beginDateTime>
            <endDateTime>2009-11-23T15:15:55.271</endDateTime>
        </timeParam>
     </queryInfo>
     <timeSeries name="NWIS Time Series Instantaneous Values">
         <values count="2876">
            <value dateTime="2009-09-24T15:30:00.000-04:00" qualifiers="P">550</value>
            <value dateTime="2009-09-24T16:00:00.000-04:00" qualifiers="P">419</value>
            <value dateTime="2009-09-24T16:30:00.000-04:00" qualifiers="P">370</value>
            .....
         </values>
     </timeSeries>
</timeSeriesResponse>

我能够使用硬编码方法执行所需的操作。但是我需要我的代码更具动态性。这是有效的方法:

I am able to do what I need, using a hard-coded method. But I need my code to be a bit more dynamic. Here is what worked:

tree = ET.parse(sample.xml)
doc = tree.getroot()

timeseries =  doc[1]
values = timeseries[2]

print child.attrib['dateTime'], child.text
#prints 2009-09-24T15:30:00.000-04:00, 550

这里有几个我尝试过的事情,没有一个起作用,报告说他们找不到timeSeries(或其他我尝试过的事情):

Here are a couple of things I've tried, none of them worked, reporting that they couldn't find timeSeries (or anything else I tried):

tree = ET.parse(sample.xml)
tree.find('timeSeries')

tree = ET.parse(sample.xml)
doc = tree.getroot()
doc.find('timeSeries')

基本上,我想加载xml文件中,搜索timeSeries标签,并遍历value标签,返回dateTime和标签本身的值;我在上面的示例中所做的所有工作,但是没有对我感兴趣的xml部分进行硬编码。有人可以给我指出一些示例,或者给我一些有关如何解决此问题的建议吗?

Basically, I want to load the xml file, search for the timeSeries tag, and iterate through the value tags, returning the dateTime and the value of the tag itself; everything I'm doing in the above example, but not hard coding the sections of xml I'm interested in. Can anyone point me to some examples, or give me some suggestions on how to work through this?

感谢所有帮助。使用以下两个建议对我提供的示例文件都起作用,但是,它们对整个文件不起作用。这是当我使用Ed Carrel的方法时从真实文件中得到的错误:

Thanks for all the help. Using both of the below suggestions worked on the sample file I provided, however, they didn't work on the full file. Here is the error I get from the real file when I use Ed Carrel's method:

 (<type 'exceptions.AttributeError'>, AttributeError("'NoneType' object has no attribute 'attrib'",), <traceback object at 0x011EFB70>)

我发现真实文件中有不喜欢的东西,所以我逐步删除了东西,直到它起作用为止。以下是我更改的行:

I figured there was something in the real file it didn't like, so I incremently removed things until it worked. Here are the lines that I changed:

originally: <timeSeriesResponse xsi:schemaLocation="a URL I removed" xmlns="a URL I removed" xmlns:xsi="a URL I removed">
 changed to: <timeSeriesResponse>

 originally:  <sourceInfo xsi:type="SiteInfoType">
 changed to: <sourceInfo>

 originally: <geogLocation xsi:type="LatLonPointType" srs="EPSG:4326">
 changed to: <geogLocation>

删除具有 xsi:...的属性可以解决此问题。 xsi:...是无效的XML吗?对我而言,以编程方式将其删除非常困难。有建议的解决方法吗?

Removing the attributes that have 'xsi:...' fixed the problem. Is the 'xsi:...' not valid XML? It will be hard for me to remove these programmatically. Any suggested work arounds?

以下是完整的XML文件: http://www.sendspace.com/file/lofcpt

Here is the full XML file: http://www.sendspace.com/file/lofcpt

当我最初问这个问题时问题,我没有意识到XML中的名称空间。现在我知道发生了什么,我不必删除 xsi属性,这是名称空间声明。我只是将它们包括在我的xpath搜索中。有关lxml中名称空间的详细信息,请参见本页

When I originally asked this question, I was unaware of namespaces in XML. Now that I know what's going on, I don't have to remove the "xsi" attributes, which are the namespace declarations. I just include them in my xpath searches. See this page for more info on namespaces in lxml.

推荐答案

现在我的盒子上有ElementTree 1.2.6,并针对您发布的XML块运行以下代码:

So I have ElementTree 1.2.6 on my box now, and ran the following code against the XML chunk you posted:

import elementtree.ElementTree as ET

tree = ET.parse("test.xml")
doc = tree.getroot()
thingy = doc.find('timeSeries')

print thingy.attrib

并获得以下信息:

{'name': 'NWIS Time Series Instantaneous Values'}

似乎找到了timeSeries元素,而无需使用数字索引。

It appears to have found the timeSeries element without needing to use numerical indices.

现在有用的是知道您说不起作用时的意思。由于在给定相同输入的情况下对我有效,因此ElementTree不太可能以某种明显的方式损坏。使用任何错误消息,回溯或任何您可以提供的帮助我们帮助您的信息来更新您的问题。

What would be useful now is knowing what you mean when you say "it doesn't work." Since it works for me given the same input, it is unlikely that ElementTree is broken in some obvious way. Update your question with any error messages, backtraces, or anything you can provide to help us help you.

这篇关于使用ElementTree示例在Python中解析XML的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆