Python XPath SyntaxError:谓词无效 [英] Python XPath SyntaxError: invalid predicate

查看:33
本文介绍了Python XPath SyntaxError:谓词无效的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试解析一个 xml

i am trying to parse an xml like

<document>
    <pages>

    <page>   
       <paragraph>XBV</paragraph>

       <paragraph>GHF</paragraph>
    </page>

    <page>
       <paragraph>ash</paragraph>

       <paragraph>lplp</paragraph>
    </page>

    </pages>
</document>

这是我的代码

import xml.etree.ElementTree as ET

tree = ET.parse("../../xml/test.xml")

root = tree.getroot()

path="./pages/page/paragraph[text()='GHF']"

print root.findall(path)

但我收到一个错误

print root.findall(path)
  File "X:\Anaconda2\lib\xml\etree\ElementTree.py", line 390, in findall
    return ElementPath.findall(self, path, namespaces)
  File "X:\Anaconda2\lib\xml\etree\ElementPath.py", line 293, in findall
    return list(iterfind(elem, path, namespaces))
  File "X:\Anaconda2\lib\xml\etree\ElementPath.py", line 263, in iterfind
    selector.append(ops[token[0]](next, token))
  File "X:\Anaconda2\lib\xml\etree\ElementPath.py", line 224, in prepare_predicate
    raise SyntaxError("invalid predicate")
SyntaxError: invalid predicate

我的 xpath 有什么问题?

what is wrong with my xpath?

跟进

感谢 falsetru,您的解决方案奏效了.我有一个跟进.现在,我想获取带有文本 GHF 的段落之前的所有段落元素.所以在这种情况下,我只需要 XBV 元素.我想忽略 ashlplp.我想这样做的一种方法是

Thanks falsetru, your solution worked. I have a follow up. Now, i want to get all the paragraph elements that come before the paragraph with text GHF. So in this case i only need the XBV element. I want to ignore the ash and lplp. i guess one way to do this would be

result = []
for para in root.findall('./pages/page/'):
    t = para.text.encode("utf-8", "ignore")
    if t == "GHF":
       break
    else:
        result.append(para)

但是有没有更好的方法来做到这一点?

but is there a better way to do this?

推荐答案

ElementTree 的XPath 支持有限. 使用其他库,如 lxml:

ElementTree's XPath support is limited. Use other library like lxml:

import lxml.etree
root = lxml.etree.parse('test.xml')

path="./pages/page/paragraph[text()='GHF']"
print root.xpath(path)

这篇关于Python XPath SyntaxError:谓词无效的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆