在lxml中名称中带有:的标签 [英] Tags with : in name in lxml
问题描述
我正在尝试使用lxml.etree解析Wordpress导出文档(它是XML,有点像RSS).我只对已发布的帖子感兴趣,所以我使用以下内容循环浏览已发布的帖子:
I'm trying to use lxml.etree to parse a Wordpress export document (it's XML, somewhat RSS like). I'm only interested in published posts, so I'm using the following to loop through published posts:
for item in data.findall("item"):
if item.find("wp:post_type").text != "post":
continue
if item.find("wp:status").text != "publish":
continue
write_post(item)
其中data
是在其中找到所有item
标签的标签.item
标签包含帖子,页面和草稿.我的问题是lxml找不到名称中带有:
的标签(例如wp:post_type
).当我尝试item.find("wp:post_type")
时,出现此错误:
where data
is the tag that all item
tags are found in. item
tags contain posts, pages, and drafts. My problem is that lxml can't find tags that have a :
in their name (e.g. wp:post_type
). When I try item.find("wp:post_type")
I get this error:
Traceback (most recent call last):
File "<input>", line 1, in <module>
File "lxml.etree.pyx", line 1279, in lxml.etree._Element.find (src/lxml/lxml.e
tree.c:38124)
File "/usr/lib64/python2.7/site-packages/lxml/_elementpath.py", line 210, in f
ind
it = iterfind(elem, path)
File "/usr/lib64/python2.7/site-packages/lxml/_elementpath.py", line 200, in i
terfind
selector = _build_path_iterator(path)
File "/usr/lib64/python2.7/site-packages/lxml/_elementpath.py", line 184, in _
build_path_iterator
selector.append(ops[token[0]](_next, token))
KeyError: ':'
我假设KeyError : ':'
指代标签名称中的冒号无效.有什么办法可以使冒号转义,以便lxml找到正确的标记?在这种情况下,:
是否具有某些特殊含义?还是我做错了什么?任何帮助将不胜感激.
I assume the KeyError : ':'
refers to the colon in the name of the tag being invalid. Is there some way I can escape the colon so that lxml finds the right tag? Does :
have some special meaning in this context? Or am I doing something wrong? Any help would be appreciated.
推荐答案
:
是XML名称空间分隔符.要在lxml中转义冒号,您需要将其替换为大括号内的名称空间URL,如item.find("{http://example.org/}status").text
所示.
The :
is an XML namespace separator. To escape the colon in lxml, you need to replace it with the namespace URL within curly braces, as in item.find("{http://example.org/}status").text
.
这篇关于在lxml中名称中带有:的标签的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!