使用 xpath 时避免类中的某些元素 [英] Avoiding certain elements within a class when using xpath

查看:30
本文介绍了使用 xpath 时避免类中的某些元素的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想拉取 a 标签中的文本,但我不希望 span class 中的文本显示新列表".使用 xpath,我怎样才能获得以下文本:

<块引用>

新!使命召唤:二战(微软 XBOX ONE 光盘 2017)二战工厂密封!

PHP 刮刀

$document = new DOMDocument( '1.0', 'UTF-8' );$document->preserveWhiteSpace = false;$internalErrors = libxml_use_internal_errors(true);$ebayhtml = file_get_contents( $ebayurl );$document->loadHTML( $ebayhtml );libxml_use_internal_errors( $internalErrors );$xpath = new DOMXpath( $document );$headers = $xpath->query('//h3[@class="lvtitle"]/a');$ebayx = 0;foreach ( $headers 作为 $title ) {如果 ( $ebayx > 9 ) {休息;} 别的 {$header = $title->nodeValue .PHP_EOL;$header = strlen($header) >60 ?substr($header,0,60) ."..." : $header;echo '

';回声 $header;echo '</pre>';$ebayx++;}}

HTML 代码被删除

<a href="https://www.ebay.com/itm/NEW-CALL-OF-DUTY-WWII-Microsoft-XBOX-ONE-DISC-2017-WW2-Factory-Sealed/173060343645?epid=237222746&amp;hash=item284b33475d:g:Xf4AAOSwI8laCc~I" class="vip" title="点击此链接访问新!使命召唤:二战(微软 XBOX 20 1 号密封件)光盘!"><span class="newly">新房源</span>新的!使命召唤:二战(微软 XBOX ONE 光盘 2017)二战工厂密封!</a>

解决方案

如果这个 XPath,

//h3[@class="lvtitle"]/a

选择目标 a 元素,然后选择这个 XPath,

//h3[@class="lvtitle"]/a/text()

将仅选择其直接文本节点子元素,因此根据要求排除 span 子元素.

I want to pull the text in the a tags, but I don't want the text in the span class that says "new listing". Using xpath, how can I get just the following text:

NEW! CALL OF DUTY: WWII (Microsoft XBOX ONE DISC 2017) WW2 Factory Sealed!

PHP SCRAPER

$document = new DOMDocument( '1.0', 'UTF-8' );
$document->preserveWhiteSpace = false;
$internalErrors = libxml_use_internal_errors( true );
$ebayhtml = file_get_contents( $ebayurl );
$document->loadHTML( $ebayhtml );
libxml_use_internal_errors( $internalErrors );

$xpath = new DOMXpath( $document );
$headers = $xpath->query( '//h3[@class="lvtitle"]/a' );
$ebayx = 0;

foreach ( $headers as $title ) {
    if ( $ebayx > 9 ) {
        break;
    } else {
        $header = $title->nodeValue . PHP_EOL;
        $header = strlen($header) > 60 ? substr($header,0,60) . "..." : $header;
        echo '<pre>';
        echo $header;
        echo '</pre>';
        $ebayx++;
                }
            }

HTML CODE BEING SCRAPED

<a href="https://www.ebay.com/itm/NEW-CALL-OF-DUTY-WWII-Microsoft-XBOX-ONE-DISC-2017-WW2-Factory-Sealed/173060343645?epid=237222746&amp;hash=item284b33475d:g:Xf4AAOSwI8laCc~I" class="vip" title="Click this link to access NEW! CALL OF DUTY: WWII (Microsoft XBOX ONE DISC 2017) WW2 Factory Sealed!"><span class="newly">New listing</span>
        NEW! CALL OF DUTY: WWII (Microsoft XBOX ONE DISC 2017) WW2 Factory Sealed!</a>

解决方案

If this XPath,

//h3[@class="lvtitle"]/a

selects the targeted a element, then this XPath,

//h3[@class="lvtitle"]/a/text()

will select only its immediate text node children and so exclude the span child element, as requested.

这篇关于使用 xpath 时避免类中的某些元素的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆