XPath谓词带有带有lxml的子路径? [英] XPath predicate with sub-paths with lxml?

查看:163
本文介绍了XPath谓词带有带有lxml的子路径?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图了解发送给我的XPath,以便与ACORD XML表单(保险中的通用格式)一起使用.他们发送给我的XPath为简洁起见被删减了:

I'm trying to understand and XPath that was sent to me for use with ACORD XML forms (common format in insurance). The XPath they sent me is (truncated for brevity):

./PersApplicationInfo/InsuredOrPrincipal[InsuredOrPrincipalInfo/InsuredOrPrincipalRoleCd="AN"]/GeneralPartyInfo

我遇到麻烦的地方是Python的 lxml告诉我[InsuredOrPrincipalInfo/InsuredOrPrincipalRoleCd="AN"]是一个invalid predicate.我无法在关于谓词的XPath规范中找到任何地方这种语法,以便我可以修改此谓词以使其起作用.

Where I'm running into trouble is that Python's lxml library is telling me that [InsuredOrPrincipalInfo/InsuredOrPrincipalRoleCd="AN"] is an invalid predicate. I'm not able to find anywhere in the XPath spec on predicates which identifies this syntax so that I can modify this predicate to work.

是否有关于该谓词确切选择内容的文档?另外,这甚至是一个有效的谓词,还是某个地方被篡改了?

Is there any documentation on what exactly this predicate is selecting? Also, is this even a valid predicate, or has something been mangled somewhere?

可能相关:

我相信与我合作的公司是一家MS商店,那么此XPath可能在C#或该堆栈中的某些其他语言上有效吗?我不太确定.

I believe the company I am working with is an MS shop, so this XPath may be valid in C# or some other language in that stack? I'm not entirely sure.

更新:

根据评论需求,这里有一些附加信息.

Per comment demand, here is some additional info.

XML示例:

<ACORD>
  <InsuranceSvcRq>
    <HomePolicyQuoteInqRq>
      <PersPolicy>
        <PersApplicationInfo>
            <InsuredOrPrincipal>
                <InsuredOrPrincipalInfo>
                    <InsuredOrPrincipalRoleCd>AN</InsuredOrPrincipalRoleCd>
                </InsuredOrPrincipalInfo>
                <GeneralPartyInfo>
                    <Addr>
                        <Addr1></Addr1>
                    </Addr>
                </GeneralPartyInfo>
            </InsuredOrPrincipal>
        </PersApplicationInfo>
      </PersPolicy>
    </HomePolicyQuoteInqRq>
  </InsuranceSvcRq>
</ACORD>

代码示例(具有完整的XPath而不是代码段):

Code sample (with full XPath instead of snippet):

>>> from lxml import etree
>>> tree = etree.fromstring(raw)
>>> tree.find('./InsuranceSvcRq/HomePolicyQuoteInqRq/PersPolicy/PersApplicationInfo/InsuredOrPrincipal[InsuredOrPrincipalInfo/InsuredOrPrincipalRoleCd="AN"]/GeneralPartyInfo/Addr/Addr1')
Traceback (most recent call last):
  File "<console>", line 1, in <module>
  File "lxml.etree.pyx", line 1409, in lxml.etree._Element.find (src/lxml/lxml.etree.c:39972)
  File "/Library/Python/2.5/site-packages/lxml-2.3-py2.5-macosx-10.3-i386.egg/lxml/_elementpath.py", line 271, in find
    it = iterfind(elem, path, namespaces)
  File "/Library/Python/2.5/site-packages/lxml-2.3-py2.5-macosx-10.3-i386.egg/lxml/_elementpath.py", line 261, in iterfind
    selector = _build_path_iterator(path, namespaces)
  File "/Library/Python/2.5/site-packages/lxml-2.3-py2.5-macosx-10.3-i386.egg/lxml/_elementpath.py", line 245, in _build_path_iterator
    selector.append(ops[token[0]](_next, token))
  File "/Library/Python/2.5/site-packages/lxml-2.3-py2.5-macosx-10.3-i386.egg/lxml/_elementpath.py", line 207, in prepare_predicate
    raise SyntaxError("invalid predicate")
SyntaxError: invalid predicate

推荐答案

tree.find更改为tree.xpath. lxml中提供了findfindall以提供与ElementTree的其他实现的兼容性. 这些方法无法实现整个XPath语言.要使用包含更多高级功能的XPath表达式,请使用xpath方法,XPath类或XPathEvaluator.

Change tree.find to tree.xpath. find and findall are present in lxml to provide compatibility with other implementations of ElementTree. These methods do not implement the entire XPath language. To employ XPath expressions containing more advanced features, use the xpath method, the XPath class, or XPathEvaluator.

例如:

import io
import lxml.etree as ET

content='''\
<ACORD>
  <InsuranceSvcRq>
    <HomePolicyQuoteInqRq>
      <PersPolicy>
        <PersApplicationInfo>
            <InsuredOrPrincipal>
                <InsuredOrPrincipalInfo>
                    <InsuredOrPrincipalRoleCd>AN</InsuredOrPrincipalRoleCd>
                </InsuredOrPrincipalInfo>
                <GeneralPartyInfo>
                    <Addr>
                        <Addr1></Addr1>
                    </Addr>
                </GeneralPartyInfo>
            </InsuredOrPrincipal>
        </PersApplicationInfo>
      </PersPolicy>
    </HomePolicyQuoteInqRq>
  </InsuranceSvcRq>
</ACORD>
'''
tree=ET.parse(io.BytesIO(content))
path='//PersApplicationInfo/InsuredOrPrincipal[InsuredOrPrincipalInfo/InsuredOrPrincipalRoleCd="AN"]/GeneralPartyInfo'
result=tree.xpath(path)
print(result)

收益

[<Element GeneralPartyInfo at b75a8194>]

tree.find产生

SyntaxError: invalid node predicate

这篇关于XPath谓词带有带有lxml的子路径?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆