lxml属性需要完整的名称空间 [英] lxml attributes require full namespace

查看:72
本文介绍了lxml属性需要完整的名称空间的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

下面的代码使用lxml(python 3.3)从Excel 2003 XML工作簿中读取一个表.代码可以正常工作,但是为了通过get()方法访问Data元素的Type属性,我需要使用键"{urn:schemas-microsoft-com:office:spreadsheet} Type"-为什么这样做,我已经用ss前缀指定了这个命名空间.

The code below reads the a table from an Excel 2003 XML workbook using lxml (python 3.3). The code works fine, however in order to access the Type attribute of the Data element via the get() method I need to use the key '{urn:schemas-microsoft-com:office:spreadsheet}Type' - why is this, I've specified this namespace with the ss prefix.

我能想到的是这个名称空间在文档中出现了两次,一次是带有名称空间前缀,一次是没有名称.

All I can think of is this namespace appears twice in the document, once with a namespace prefix and once without i.e.

<Workbook xmlns="urn:schemas-microsoft-com:office:spreadsheet"
 xmlns:o="urn:schemas-microsoft-com:office:office"
 xmlns:x="urn:schemas-microsoft-com:office:excel"
 xmlns:ss="urn:schemas-microsoft-com:office:spreadsheet"
 xmlns:html="http://www.w3.org/TR/REC-html40">

在文件中,元素和属性的声明如下:-带ss:前缀的Type属性以及不带前缀的Cell和Data元素.但是,声明中说两者都属于同一个架构'urn:schemas-microsoft-com:office:spreadsheet',因此解析器肯定应该对它们进行同等对待吗?

And in the file the element and attribute are declared as below - The Type attribute with ss: prefix and the Cell and Data element with no prefix. However the declaration says both belong to the same schema 'urn:schemas-microsoft-com:office:spreadsheet' so surely the parser should treat them equivalently?

<Cell><Data ss:Type="String">QB11128020</Data></Cell>

我的代码:

with (open(filename,'r')) as f:
    doc = etree.parse(f)

namespaces={'o':'urn:schemas-microsoft-com:office:office',
            'x':'urn:schemas-microsoft-com:office:excel',
            'ss':'urn:schemas-microsoft-com:office:spreadsheet'}

ws = doc.xpath('/ss:Workbook/ss:Worksheet', namespaces=namespaces)
if len(ws) > 0: 
    tables = ws[0].xpath('./ss:Table', namespaces=namespaces)
    if len(tables) > 0: 
        rows = tables[0].xpath('./ss:Row', namespaces=namespaces)
        for row in rows:
            cells = row.xpath('./ss:Cell/ss:Data', namespaces=namespaces)
            for cell in cells:
                print(cell.text);
                print(cell.keys());
                print(cell.get('{urn:schemas-microsoft-com:office:spreadsheet}Type'));

推荐答案

根据 lxml.etree教程- -命名空间:

ElementTree API尽可能避免名称空间前缀,并且 部署真实的名称空间(URI):

The ElementTree API avoids namespace prefixes wherever possible and deploys the real namespaces (the URI) instead:


顺便说一句,关注


BTW, following

cell.get('{urn:schemas-microsoft-com:office:spreadsheet}Type')

可以写为:

cell.get('{%(ss)s}Type' % namespaces)

或:

cell.get('{{{0[ss]}}}Type'.format(namespaces))

这篇关于lxml属性需要完整的名称空间的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆