获取文本后跟特定文本或获取所有文本(如果该文本丢失) [英] Get text followed by certain text or get all text if that text is missing

查看:30
本文介绍了获取文本后跟特定文本或获取所有文本(如果该文本丢失)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要从 HTML 页面获取文本,但其中一些包含不必要的文本,这些文本位于页面中的某些文本之后('---------').例如.HTML 页面示例 1:

I need to get the texts from HTML pages but some of them contain unnecessary texts which go after certain text in page ('---------'). E.g. example of HTML page 1:

...
<p> This is correct text. Everything after it is wrong</p>
<p>---------</p>
<p><strong>This is wrong text</strong></p>
<p> This is wrong another text</p>
...

HTML 页面 2 示例:

Example of HTML page 2:

...
<p> This is correct text. Everything after it is wrong</p>
<p> This text is also valid </p>
<p> This is another correct text</p>
...

因此,如果页面包含 '-----------------',我只需要在它之前抓取文本 - 我需要抓取所有内容.如此处所述(获取文本后跟特定文本),我可以使用:

So if page contains '-----------------', I need to grab only texts before it otherways - I need to grab everything. As noted here (Get text followed by certain text) I can use:

//p[following-sibling::p[contains(.,'---------')]][1]/text()

对于第一个示例.但是有没有办法在两种情况下都使用一个 XPath?

For the 1st example. But is there a way to use one XPath for both cases?

推荐答案

//p[    not(contains(.,'---------')) 
    and not(preceding-sibling::p[contains(.,'---------')])]//text()

会回来

This is correct text. Everything after it is wrong

对于您的第一个案例和

This is correct text. Everything after it is wrong
This text is also valid
This is another correct text

对于您的第二种情况,根据要求.

for your second case, as requested.

这篇关于获取文本后跟特定文本或获取所有文本(如果该文本丢失)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆