python:基本XML解析错误(使用lxml) [英] python: error with basic XML parsing (with lxml)

查看:49
本文介绍了python:基本XML解析错误(使用lxml)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用 lxml 使用 python 解析 XML 文件,但在基本尝试时出错.我使用 这篇文章lxml 教程引导.

I am trying to parse an XML file with python using lxml, but get an error on basic attempts. I use this post and the lxml tutorials to bootstrap.

我的 XML 文件基本上是根据下面的记录构建的(我对其进行了修剪以使其更易于阅读):

My XML file is basically built from records below (I trimmed it down so that it is easier to read):

<?xml version="1.0" ?>
<?xml-stylesheet href="file:///usr/share/nmap/nmap.xsl" type="text/xsl"?>
<nmaprun scanner="nmap" args="nmap -sV -p135,12345 -oX 10.232.0.0.16.xml 10.232.0.0/16" start="1340201347" startstr="Wed Jun 20 16:09:07 2012" version="5.21" xmloutputversion="1.03">
<host>
  <hostnames>
    <hostname name="host1.example.com" type="PTR"/>
  </hostnames>
</host>
</nmaprun>

我通过这个复杂的脚本运行它:

I run it through this complicated script:

from lxml import etree

d = etree.parse("myfile.xml")
for host in d.findall("host"):
    aa = host.find("hostnames/hostname")
    print aa.attrib["name"]

我得到 AttributeError: 'NoneType' object has no attribute 'attrib'print 行.我检查了 dhostaa 的值,它们都被定义为元素.

I get AttributeError: 'NoneType' object has no attribute 'attrib' on the print line. I checked the value of d, host and aa and they are all defined as Elements.

如果这很明显(而且很可能是),请预先道歉.

Upfront apologies if this is something obvious (and it probably is).

我按要求添加了 XML 文件的标题(我仍在阅读和重读答案:))

I added the header of the XML file as requested (I am still reading and rereading the answers :))

谢谢!

推荐答案

虽然使用 XPath 会更有意义,但您的代码在单独运行时已经可以正常工作,只要处理主机找不到主机名的情况:

Though it would make more sense to use XPath, your code already works fine when standing alone, so long as one handles the case where a host has no hostnames found:

doc = lxml.etree.XML("""
  <nmaprun>
    <host>
      <hostnames>
        <hostname name="host1.example.com" type="PTR"/>
      </hostnames>
    </host>
  </nmaprun>""")
for host in doc.findall('host'):
  host_el = host.find('hostnames/hostname')
  if host_el is not None:
    print host_el.attrib['name']

使用 XPath(doc.xpath() 而不是 doc.find()doc.findall()),可以做到更好的是,仅过滤带有名称的主机名,从而完全避免错误记录:

With XPath (doc.xpath() rather than doc.find() or doc.findall()), one could do better, filtering only for hostnames with a name and thus avoiding the faulty records altogether:

  • host[hostnames/hostname/@name] 会找到至少有一个 hostnameshostname<的 host/code> 带有 name 属性.
  • //hostnames/hostname/@name 将直接仅返回名称本身(如果使用 lxml,则将这些作为字符串公开).
  • host[hostnames/hostname/@name] will find hosts which have at least one hostnames with a hostname with a a name attribute.
  • //hostnames/hostname/@name will directly return only the names themselves (if using lxml, exposing these as strings).

这篇关于python:基本XML解析错误(使用lxml)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆