当多个子节点共享一个名称时,使用Python解析XML [英] Parse XML with Python when multiple children share a name

查看:180
本文介绍了当多个子节点共享一个名称时,使用Python解析XML的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我目前有一个要解析的XML文件.到目前为止,这是我的代码.

I currently have an XML file I am trying to parse. Here is my code thus far.

from xml.etree import ElementTree

with open('data.xml', 'rt') as f:
    tree = ElementTree.parse(f)

for node in tree.iter('Host'):
    hostname = node.find('Name').text
    ip = node.find('Networking/IP').text
    print hostname
    print ip

但是,由于所有这些设备都有3个IP地址,所以我遇到了一个问题,因此有多个具有完全相同名称的XML子级".这是示例(实际主机名受阻)

However, I am running into an issue because all of these devices have 3 IP addresses, so there are multiple XML "children" with the exact same name. Here is the sample (actual hostname obstructed)

<?xml version="1.0" encoding="UTF-8"?>
<APIResponse>
  <HostRecords>
    <Type>Dedicated</Type>
      <Host>
        <Name>dc-01-a.domain.com</Name>
        <Active>1</Active>
        <Networking>
          <Primary>Yes</Weight>
          <IP>10.0.8.72</IP>
        </Networking>
        <Networking>
          <Primary>No</Weight>
          <IP>10.12.12.1</IP>
        </Networking>
        <Networking>
          <Primary>Yes</Weight>
          <IP>fd30:0000:0000:0001:ff4e:003e:0009:000e</IP>
        </Networking>
      </Host>
    </Type>
  </HostRecords>
</APIResponse>

因此,我的测试脚本会提取第一个IP,但是如何提取下两个IP?由于网络/IP"在3个方面是完全相同的,所以只能拉一个.另外,我如何做到这一点,使其仅捕获标记为主要"的IP?

So my test script pulls the first IP, but how do I pull the next two IPs? Since 'Networking/IP' is the exact same thing in 3 spots, but it will only pull one. Also, How would I make it so that it only grabs IPs that are labeled as Primary?

如果我尝试使用findall而不是找到

If I try with findall instead of find I get

AttributeError:列表"对象没有属性文本"

AttributeError: 'list' object has no attribute 'text'

如果我删除了文字部分

[<Element 'RData' at 0x10ef67650>, <Element 'RData' at 0x10ef67750>, <Element 'RData' at 0x10ef67850>]

因此它返回,但不作为实际的可读数据.

So it returns, but not as the actual readable data.

推荐答案

find方法可以接受一些有限的Xpath表达式,您可以使用它来仅提取标记为Primary的IP:

The find method can accept some limited Xpath expressions, you can use this to extract only IPs which are marked as Primary:

from xml.etree import ElementTree
tree = ElementTree.fromstring(sample)

for node in tree.iter('Host'):
    hostname = node.find('Name').text
    ips = node.findall("Networking[Primary='Yes']/IP")
    print hostname
    for ip in ips:
        print ip.text

有关允许使用哪种XPath表达式的更多信息,请参见以下文档: https://docs.python.org/2/library/xml.etree.elementtree.html#xml.etree.ElementTree.Element

For further information on what XPath expressions are allowed see the documentation at: https://docs.python.org/2/library/xml.etree.elementtree.html#xml.etree.ElementTree.Element

问题中提供的示例XML在几个区域中格式错误(大概是因为混淆而使其难以发布,或者给出的代码示例永远无法正常工作). 类型"标签关闭了两次,主要"标签与关闭的重量"标签不匹配

The sample XML provided in the question is malformed in a couple of areas (presumably when it was obfuscated for posting, or the code example given could never have worked). The Type tag is closed twice, and the Primary tags are mismatched with closing Weight tags

这篇关于当多个子节点共享一个名称时,使用Python解析XML的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆