xmlns命名空间中断lxml [英] xmlns namespace breaking lxml

查看:83
本文介绍了xmlns命名空间中断lxml的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试打开xml文件,并从某些标签获取值.我已经做了很多,但是这个特殊的xml给了我一些问题.这是xml文件的一部分:

I am trying to open an xml file, and get values from certain tags. I have done this a lot but this particular xml is giving me some issues. Here is a section of the xml file:

<?xml version='1.0' encoding='UTF-8'?>
<package xmlns="http://apple.com/itunes/importer" version="film4.7">
  <provider>filmgroup</provider>
  <language>en-GB</language>
  <actor name="John Smith" display="Doe John"</actor>
</package>

这是我的python代码示例:

And here is a sample of my python code:

metadata = '/Users/mylaptop/Desktop/Python/metadata.xml'
from lxml import etree
parser = etree.XMLParser(remove_blank_text=True)
open(metadata)
tree = etree.parse(metadata, parser)
root = tree.getroot()
for element in root.iter(tag='provider'):
    providerValue = tree.find('//provider')
    providerValue = providerValue.text
    print providerValue
tree.write('/Users/mylaptop/Desktop/Python/metadataDone.xml', pretty_print = True, xml_declaration = True, encoding = 'UTF-8')

运行此命令时,找不到提供程序标记或其值.如果删除xmlns="http://apple.com/itunes/importer",则所有工作均按预期进行. 我的问题是如何删除此命名空间,因为我对此一点都不感兴趣,因此可以使用lxml获取所需的标记值?

When I run this it can't find the provider tag or its value. If I remove xmlns="http://apple.com/itunes/importer" then all work as expected. My question is how can I remove this namespace, as i'm not at all interested in this, so I can get the tag values I need using lxml?

推荐答案

provider标记位于http://apple.com/itunes/importer命名空间中,因此您需要使用标准名称

The provider tag is in the http://apple.com/itunes/importer namespace, so you either need to use the fully qualified name

{http://apple.com/itunes/importer}provider

或使用具有 namespaces参数的lxml方法之一,例如.然后,您可以使用名称空间前缀(例如ns:provider)指定它:

or use one of the lxml methods that has the namespaces parameter, such as root.xpath. Then you can specify it with a namespace prefix (e.g. ns:provider):

from lxml import etree
parser = etree.XMLParser(remove_blank_text=True)
tree = etree.parse(metadata, parser)
root = tree.getroot()
namespaces = {'ns':'http://apple.com/itunes/importer'}
items = iter(root.xpath('//ns:provider/text()|//ns:actor/@name',
                       namespaces=namespaces))
for provider, actor in zip(*[items]*2):
    print(provider, actor)

收益

('filmgroup', 'John Smith')

请注意,上面使用的XPath假定<provider><actor>元素始终交替出现.如果那是不正确的,那么当然有处理它的方法,但是代码变得更加冗长:

Note that the XPath used above assumes that <provider> and <actor> elements always appear in alternation. If that is not true, then there are of course ways to handle it, but the code becomes a bit more verbose:

for package in root.xpath('//ns:package', namespaces=namespaces):
    for provider in package.xpath('ns:provider', namespaces=namespaces):
        providerValue = provider.text
        print providerValue
    for actor in package.xpath('ns:actor', namespaces=namespaces):
        print actor.attrib['name']

这篇关于xmlns命名空间中断lxml的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆