XPath.选择“A"标签文本但仅限于特定文本值 [英] XPath. Select 'A' tag text BUT only up to specific text value

查看:35
本文介绍了XPath.选择“A"标签文本但仅限于特定文本值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我从电影网站上读取了以下 HTML 代码:

导演<a href="http://...">Bobby Farrelly</a>、<a href="http://...">Peter Farrelly</a>.与 <a href="http://...>Jim Carrey</a>、<a href="http://...">Jeff Daniels</a>.<div class="红色">第 1 页

我正在尝试使用 XPath 将导演与演员分开.如您所见

董事是:鲍比法雷利和彼得法雷利

演员是:金凯瑞和杰夫丹尼尔斯

从这种格式错误的 XML 中区分导演和演员的唯一方法是检测字符串.With"并选择 A 标签.

通过使用:

foreach($r as $result) {$tag = $result->getElementsByTagName("a");foreach($tag as $text) {$t = trim(preg_replace("/[\r\n]+/", " ", $text->nodeValue));}}

我可以选择 DIV 和 A 标签内的文本.但这将选择所有 A 标签,为了让导演只需要我只需要选择 A 标签内的文本直到.With"字符串.

解决方案

一种可能的 xpath :

//div[@class="blue"]/a[following-sibling::text()[contains(., "With")]]

以上 xpath 读取:查找所有 div 其中 class 属性值等于blue".然后从每个这样的 div 中,在包含文本 "With" 的文本节点之前选择所有 标签.

xpath tester 中输出:

'<a href="http://...">Bobby Farrelly</a>''<a href="http://...">Peter Farrelly</a>'

I have the following HTML code that I'm reading from a movies web site:

<div class="blue">
    Director <a href="http://...">Bobby Farrelly</a>, <a href="http://...">Peter Farrelly</a>. With <a href="http://...>Jim Carrey</a>, <a href="http://...">Jeff Daniels</a>.
    <div class="red">
         page 1
    </div>
</div>

I'm trying to separate the director(s) from the actors usign XPath. As you may see

directors are: Bobby Farrelly and Peter Farrelly

actors are: Jim Carrey and Jeff Daniels

The only way to distinguish directors from actors from this bad formed XML is detecting the string ". With" and selecting the A tags up to it.

By using:

foreach($r as $result) {
    $tag = $result->getElementsByTagName("a");
    foreach($tag as $text) {
        $t = trim(preg_replace("/[\r\n]+/", " ", $text->nodeValue));
    }
}

I can select the DIV and the text inside the A tags. But this will select ALL the A tags, to get the directors only I need to select only the text inside the A tags up to the ". With" string.

解决方案

One possible xpath :

//div[@class="blue"]/a[following-sibling::text()[contains(., "With")]]

Above xpath reads: find all div where class attribute value equals "blue". Then from within each of such div, select all <a> tag before text node containing text "With".

Output in xpath tester :

'<a href="http://...">Bobby Farrelly</a>'
'<a href="http://...">Peter Farrelly</a>'

这篇关于XPath.选择“A"标签文本但仅限于特定文本值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆