如何匹配文本节点,然后使用XPath跟随父节点 [英] How to match a text node then follow parent nodes using XPath

查看:153
本文介绍了如何匹配文本节点,然后使用XPath跟随父节点的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图用XPath解析一些HTML。遵循下面的简化XML示例,我想匹配字符串'Text 1',然后获取相关 content 节点的内容。

 < doc> 
< block>
< title>文字1< / title>
< content>我要的东西< / content>
< / block>

< block>
< title>文字2< / title>
< content>我不想要的东西< / content>
< / block>
< / doc>

我的Python代码抛出一个不稳定的结果:

 >>> from lxml import etree 
>>>
>>> tree = etree.XML(< doc>< block>< title> Text 1< / title>< content> Stuff
我要< / content>< / block>< block><< ;标题>文本2< /标题><内容>材料< / content>< / block>< / doc>)
>>>
>>> #获取所有标题
... tree.xpath('// title / text()')
['Text 1','Text 2']
>>>
>>> #match'Text 1'
... tree.xpath('// title / text()=Text 1')
True
>>>
>>> #跟随所选节点的父节点
... tree.xpath('// title / text()/../..// text()')
['Text 1','Stuff我想','文字2','我不想要的东西']
>>>
>>> #跟随选定节点的父节点
... tree.xpath('// title / text()=Text 1/../..// text()')
Traceback最近的调用最后):
文件< stdin>,第1行,位于< module>
在lxml.etree._Element.xpath(src /
lxml / lxml.etree.c:14542)中的文件lxml.etree.pyx,第1330行,
文件xpath.pxi ,第287行,在lxml.etree.XPathElementEvaluator .__ ca
ll__(src / lxml / lxml.etree.c:90093)
文件xpath.pxi,第209行,位于lxml.etree。 _XPathEvaluatorBase._handl
e_result(src / lxml / lxml.etree.c:89446)
文件xpath.pxi,第194行,位于lxml.etree._XPathEvaluatorBase._raise
_eval_error(src /lxml/lxml.etree.c:89281)
lxml.etree.XPathEvalError:无效类型

这可能在XPath中吗?我需要以不同的方式表达自己想要做的事情吗?

你想要那个吗? b
$ b

  // title [text()='Text 1'] /../ content / text()


I'm trying to parse some HTML with XPath. Following the simplified XML example below, I want to match the string 'Text 1', then grab the contents of the relevant content node.

<doc>
    <block>
        <title>Text 1</title>
        <content>Stuff I want</content>
    </block>

    <block>
        <title>Text 2</title>
        <content>Stuff I don't want</content>
    </block>
</doc>

My Python code throws a wobbly:

>>> from lxml import etree
>>>
>>> tree = etree.XML("<doc><block><title>Text 1</title><content>Stuff 
I want</content></block><block><title>Text 2</title><content>Stuff I d
on't want</content></block></doc>")
>>>
>>> # get all titles
... tree.xpath('//title/text()')
['Text 1', 'Text 2']
>>>
>>> # match 'Text 1'
... tree.xpath('//title/text()="Text 1"')
True
>>>
>>> # Follow parent from selected nodes
... tree.xpath('//title/text()/../..//text()')
['Text 1', 'Stuff I want', 'Text 2', "Stuff I don't want"]
>>>
>>> # Follow parent from selected node
... tree.xpath('//title/text()="Text 1"/../..//text()')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "lxml.etree.pyx", line 1330, in lxml.etree._Element.xpath (src/
lxml/lxml.etree.c:14542)
  File "xpath.pxi", line 287, in lxml.etree.XPathElementEvaluator.__ca
ll__ (src/lxml/lxml.etree.c:90093)
  File "xpath.pxi", line 209, in lxml.etree._XPathEvaluatorBase._handl
e_result (src/lxml/lxml.etree.c:89446)
  File "xpath.pxi", line 194, in lxml.etree._XPathEvaluatorBase._raise
_eval_error (src/lxml/lxml.etree.c:89281)
lxml.etree.XPathEvalError: Invalid type

Is this possible in XPath? Do I need to express what I want to do in a different way?

解决方案

Do you want that?

//title[text()='Text 1']/../content/text()

这篇关于如何匹配文本节点,然后使用XPath跟随父节点的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆