用于选择文章的一部分的XPath [英] XPath for selecting a section of an article
问题描述
< h2>简介< / h2> ;
...
< h2>参考文献< / h2>
...一堆文本...
< h2>其他阅读< / h2> //可选
.....
我想知道是否可以用XPath表达式提取上面例子中的References部分?
我试过类似于 // h2 [contains(。,'References'] / following :: *
,但是我不知道如何指定我想要的部分的结束,它返回文档的其余部分。
如果你想要元素直到下一个h2使用这样的xpath
// * [ -sibling :: h2 [包含(。,'References')]]和之前的sibling :: h2 [contains(。,'References')]]
Wath是否表示:它找到所有元素
- - 包含'参考文献'的第一个h2的前面h2包含'参考文献'
- 返回包含'参考文献'的h2
第一条规则从xml的开始到下一个h2标签都采用了所有的元素。第二条规则 - 必要的h2标签到e nd of xml。它们的相交给出了需要的元素。
或者xpath可能建立在您的建议之上:
<$ p $ (包含(。,'References')]和not(name())。 )='h2')]
必要之后全部h2标签 // h2 [。='参考文献'] / following-sibling :: *
这不是h2,并且有h2标签作为第一个h2之前
Suppose a section of an article is as follows (the html source):
<h2>Introduction</h2>
....
<h2>References</h2>
...a bunch of text...
<h2>Further Readings</h2> //optional
.....
I like to know is it possible with an XPath expression extract the "References" part in the example above?
I tried something like //h2[contains(.,'References']/following::*
, however I don't know how to specify the end of my desired section, it returns the rest of document.
if you want elements until next h2 use such xpath
//*[following-sibling::h2[preceding-sibling::h2[1][contains(.,'References')]] and preceding-sibling::h2[contains(.,'References')]]
Wath does it mean: it finds all element which has
-- ahead h2 which has the 1st preceding h2 containing 'References'
-- back h2 containing 'References'
The 1st rule takes all elements from begining of xml until next h2 tag. The 2nd -all after necessary h2 tag to end of xml. Intersection of them gives needed elements.
Or xpath maybe build on your suggestion:
//h2[.='References']/following-sibling::*[preceding-sibling::h2[1][contains(.,'References')] and not(name()='h2')]
take all after necessary h2 tag //h2[.='References']/following-sibling::*
which is not h2 and has our h2 tag as the 1st h2 before
这篇关于用于选择文章的一部分的XPath的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!