Python lxml-使用xml:lang属性检索元素 [英] Python lxml - using the xml:lang attribute to retrieve an element

查看:111
本文介绍了Python lxml-使用xml:lang属性检索元素的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一些xml,其中包含多个具有相同名称的元素,但是每个元素都使用不同的语言,例如:

I have some xml which has multiple elements with the same name, but each is in a different language, for example:

<Title xml:lang="FR" type="main">Les Tudors</Title>
<Title xml:lang="DE" type="main">Die Tudors</Title>
<Title xml:lang="IT" type="main">The Tudors</Title>

通常,我会使用元素的属性来检索元素,如下所示:

Normally, I'd retrieve an element using its attributes as follows:

titlex = info.find('.//xmlns:Title[@someattribute=attributevalue]', namespaces=nsmap)

例如,如果我尝试使用[@xml:lang ="FR"]进行此操作,则会收到回溯错误:

If I try and do this with [@xml:lang="FR"] (for example), I get the traceback error:

  File "D:/Python code/RBM CRID, Title, Genre/CRID, Title, Genre, Age rating, Episode Number, Descriptions V1.py", line 29, in <module>
    titlex = info.find('.//xmlns:Title[@xml:lang=PL]', namespaces=nsmap) 

  File "lxml.etree.pyx", line 1457, in lxml.etree._Element.find (src\lxml\lxml.etree.c:51435)

  File "C:\Python34\lib\site-packages\lxml\_elementpath.py", line 282, in find
    it = iterfind(elem, path, namespaces)

  File "C:\Python34\lib\site-packages\lxml\_elementpath.py", line 272, in iterfind
    selector = _build_path_iterator(path, namespaces)

  File "C:\Python34\lib\site-packages\lxml\_elementpath.py", line 256, in _build_path_iterator
    selector.append(ops[token[0]](_next, token))

  File "C:\Python34\lib\site-packages\lxml\_elementpath.py", line 134, in prepare_predicate
    token = next()

  File "C:\Python34\lib\site-packages\lxml\_elementpath.py", line 80, in xpath_tokenizer
    raise SyntaxError("prefix %r not found in prefix map" % prefix) SyntaxError: prefix 'xml' not found in prefix map

对此我并不感到惊讶,但是我想提出有关如何解决该问题的建议.

I'm not surprised by this, but I'd like suggestions on how to get around the issue.

谢谢!

根据要求,提供了完整但完整的代码集(如果删除[bitsinsquarebrackets],它可以按预期工作):

As requested, a cut-down but complete set of code (It works as expected if I remove the [bitsinsquarebrackets]):

import lxml
import codecs

file_name = (input('Enter the file name, excluding .xml extension: ') + '.xml')# User inputs file name
print('Parsing ' + file_name)


#----- Sets up import and namespace

from lxml import etree

parser = lxml.etree.XMLParser()


tree = lxml.etree.parse(file_name, parser)                                 # Name of file to test goes here
root = tree.getroot()

nsmap = {'xmlns': 'urn:tva:metadata:2012',
         'mpeg7': 'urn:tva:mpeg7:2008'}

#----- This code writes the output to a file

with codecs.open(file_name+'.log', mode='w', encoding='utf-8') as f:                        # Name the output file
    f.write(u'CRID|Title|Genre|Rating|Short Synopsis|Medium Synopsis|Long Synopsis\n')
    for info in root.xpath('//xmlns:ProgramInformation', namespaces=nsmap):
       titlex = info.find('.//xmlns:Title[xml:lang="PL"]', namespaces=nsmap)             # Retreve the title
       title = titlex.text if titlex != None else 'Missing'             # If there isn't a title, print an alternative word
       f.write(u'{}\n'.format(title))                     # Write all the retrieved values to the same line with bar seperators and a new line

推荐答案

xml:lang中的xml前缀不需要在XML文档中声明,但是如果要在XPath查找中使用xml:lang,您必须在Python代码中定义一个前缀映射.

The xml prefix in xml:lang does not need to be declared in an XML document, but if you want to use xml:lang in XPath lookups, you have to define a prefix mapping in the Python code.

xml前缀是保留的(与任意的常规"名称空间前缀相反),并定义为绑定到http://www.w3.org/XML/1998/namespace.请参阅 XML 1.0中的命名空间 W3C建议.

The xml prefix is reserved (as opposed to "normal" namespace prefixes which are arbitrary) and defined to be bound to http://www.w3.org/XML/1998/namespace. See the Namespaces in XML 1.0 W3C recommendation.

示例:

from lxml import etree

# Required mapping
nsmap = {"xml": "http://www.w3.org/XML/1998/namespace"}

XML = """
<root>
  <Title xml:lang="FR" type="main">Les Tudors</Title>
  <Title xml:lang="DE" type="main">Die Tudors</Title>
  <Title xml:lang="IT" type="main">The Tudors</Title>
</root>"""

doc = etree.fromstring(XML)

title_FR = doc.find('Title[@xml:lang="FR"]', namespaces=nsmap)
print title_FR.text

输出:

Les Tudors


如果xml前缀没有映射,则会出现"在前缀映射中找不到前缀'xml'"的错误.如果映射到xml前缀的URI不是http://www.w3.org/XML/1998/namespace,则上面代码段中的find方法将不返回任何内容.


If there is no mapping for the xml prefix, you get the "prefix 'xml' not found in prefix map" error. If the URI mapped to the xml prefix is not http://www.w3.org/XML/1998/namespace, the find method in the code snippet above does not return anything.

这篇关于Python lxml-使用xml:lang属性检索元素的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆