用于选择文章的一部分的XPath [英] XPath for selecting a section of an article

查看:93
本文介绍了用于选择文章的一部分的XPath的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

假设某篇文章的内容如下(html源代码):

 < h2>简介< / h2> ; 
...
< h2>参考文献< / h2>
...一堆文本...
< h2>其他阅读< / h2> //可选
.....

我想知道是否可以用XPath表达式提取上面例子中的References部分?



我试过类似于 // h2 [contains(。,'References'] / following :: * ,但是我不知道如何指定我想要的部分的结束,它返回文档的其余部分。

解决方案

如果你想要元素直到下一个h2使用这样的xpath

  // * [ -sibling :: h2 [包含(。,'References')]]和之前的sibling :: h2 [contains(。,'References')]] 

Wath是否表示:它找到所有元素

- - 包含'参考文献'的第一个h2的前面h2包含'参考文献'

- 返回包含'参考文献'的h2



第一条规则从xml的开始到下一个h2标签都采用了所有的元素。第二条规则 - 必要的h2标签到e nd of xml。它们的相交给出了需要的元素。



或者xpath可能建立在您的建议之上:

<$ p $ (包含(。,'References')]和not(name())。 )='h2')]

必要之后全部h2标签 // h2 [。='参考文献'] / following-sibling :: * 这不是h2,并且有h2标签作为第一个h2之前


Suppose a section of an article is as follows (the html source):

<h2>Introduction</h2>
  ....
<h2>References</h2>
  ...a bunch of text...
<h2>Further Readings</h2>  //optional
  .....

I like to know is it possible with an XPath expression extract the "References" part in the example above?

I tried something like //h2[contains(.,'References']/following::*, however I don't know how to specify the end of my desired section, it returns the rest of document.

解决方案

if you want elements until next h2 use such xpath

//*[following-sibling::h2[preceding-sibling::h2[1][contains(.,'References')]]  and preceding-sibling::h2[contains(.,'References')]]

Wath does it mean: it finds all element which has

-- ahead h2 which has the 1st preceding h2 containing 'References'

-- back h2 containing 'References'

The 1st rule takes all elements from begining of xml until next h2 tag. The 2nd -all after necessary h2 tag to end of xml. Intersection of them gives needed elements.

Or xpath maybe build on your suggestion:

//h2[.='References']/following-sibling::*[preceding-sibling::h2[1][contains(.,'References')] and not(name()='h2')]

take all after necessary h2 tag //h2[.='References']/following-sibling::* which is not h2 and has our h2 tag as the 1st h2 before

这篇关于用于选择文章的一部分的XPath的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆