python:基本XML解析错误(使用lxml) [英] python: error with basic XML parsing (with lxml)
问题描述
我正在尝试使用 lxml 使用 python 解析 XML 文件,但在基本尝试时出错.我使用 这篇文章 和 lxml 教程引导.
I am trying to parse an XML file with python using lxml, but get an error on basic attempts. I use this post and the lxml tutorials to bootstrap.
我的 XML 文件基本上是根据下面的记录构建的(我对其进行了修剪以使其更易于阅读):
My XML file is basically built from records below (I trimmed it down so that it is easier to read):
<?xml version="1.0" ?>
<?xml-stylesheet href="file:///usr/share/nmap/nmap.xsl" type="text/xsl"?>
<nmaprun scanner="nmap" args="nmap -sV -p135,12345 -oX 10.232.0.0.16.xml 10.232.0.0/16" start="1340201347" startstr="Wed Jun 20 16:09:07 2012" version="5.21" xmloutputversion="1.03">
<host>
<hostnames>
<hostname name="host1.example.com" type="PTR"/>
</hostnames>
</host>
</nmaprun>
我通过这个复杂的脚本运行它:
I run it through this complicated script:
from lxml import etree
d = etree.parse("myfile.xml")
for host in d.findall("host"):
aa = host.find("hostnames/hostname")
print aa.attrib["name"]
我得到 AttributeError: 'NoneType' object has no attribute 'attrib'
在 print
行.我检查了 d
、host
和 aa
的值,它们都被定义为元素.
I get AttributeError: 'NoneType' object has no attribute 'attrib'
on the print
line.
I checked the value of d
, host
and aa
and they are all defined as Elements.
如果这很明显(而且很可能是),请预先道歉.
Upfront apologies if this is something obvious (and it probably is).
我按要求添加了 XML 文件的标题(我仍在阅读和重读答案:))
I added the header of the XML file as requested (I am still reading and rereading the answers :))
谢谢!
推荐答案
虽然使用 XPath 会更有意义,但您的代码在单独运行时已经可以正常工作,只要处理主机找不到主机名的情况:
Though it would make more sense to use XPath, your code already works fine when standing alone, so long as one handles the case where a host has no hostnames found:
doc = lxml.etree.XML("""
<nmaprun>
<host>
<hostnames>
<hostname name="host1.example.com" type="PTR"/>
</hostnames>
</host>
</nmaprun>""")
for host in doc.findall('host'):
host_el = host.find('hostnames/hostname')
if host_el is not None:
print host_el.attrib['name']
使用 XPath(doc.xpath()
而不是 doc.find()
或 doc.findall()
),可以做到更好的是,仅过滤带有名称的主机名,从而完全避免错误记录:
With XPath (doc.xpath()
rather than doc.find()
or doc.findall()
), one could do better, filtering only for hostnames with a name and thus avoiding the faulty records altogether:
host[hostnames/hostname/@name]
会找到至少有一个hostnames
和hostname<的
host
/code> 带有name
属性.//hostnames/hostname/@name
将直接仅返回名称本身(如果使用lxml
,则将这些作为字符串公开).
host[hostnames/hostname/@name]
will findhost
s which have at least onehostnames
with ahostname
with a aname
attribute.//hostnames/hostname/@name
will directly return only the names themselves (if usinglxml
, exposing these as strings).
这篇关于python:基本XML解析错误(使用lxml)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!