从网页中提取价值 [英] extract value from web page

查看：47 发布时间：2021/7/17 18:45:00 php screen-scraping

本文介绍了从网页中提取价值的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一个网站主页，我正在使用 Curl 阅读该主页，我需要获取该网站的页面数量.

信息在一个 div:-


<span class="当前页码">1</span><a href="/users?page=2" title="转到第 2 页"><span class="page-numbers">2</span></a><a href="/users?page=3" title="转到第 3 页"><span class="page-numbers">3</span></a><a href="/users?page=4" title="转到第 4 页"><span class="page-numbers">4</span></a><a href="/users?page=5" title="转到第 5 页"><span class="page-numbers">5</span></a><span class="page-numbers dots">&hellip;</span><a href="/users?page=15" title="转到第 15 页"><span class="page-numbers">15</span></a><a href="/users?page=2" title="转到第 2 页"><span class="下一个页码">下一个</span></a>

我需要的值是 15，但这可以是任何数字，具体取决于站点，但始终处于相同位置.

如何轻松读取此值并将其分配给 PHP 中的变量.

谢谢

乔纳森

解决方案

您可以使用 PHP 的 DOM 模块为此.使用 DOMDocument::loadhtmlfile() 读取页面，然后创建一个 DOMXPath 对象并查询文档中具有 class="page-numbers" 属性的所有 span 元素.

(哎呀，这不是您要找的，请参阅第二个代码段)

$html = ':::<div class="pager"><span class="当前页码">1</span><a href="/users?page=2" title="转到第 2 页"><span class="page-numbers">2</span></a><a href="/users?page=3" title="转到第 3 页"><span class="page-numbers">3</span></a><a href="/users?page=4" title="转到第 4 页"><span class="page-numbers">4</span></a><a href="/users?page=5" title="转到第 5 页"><span class="page-numbers">5</span></a><span class="page-numbers dots">&hellip;</span><a href="/users?page=15" title="转到第 15 页"><span class="page-numbers">15</span></a><a href="/users?page=2" title="转到第 2 页"><span class="下一个页码">下一个</span></a>

</body></html>';$doc = 新的 DOMDocument;//由于内容已经在这里"，我们使用 loadhtml(content)//而不是 loadhtmlfile(url)$doc->loadhtml($html);$xpath = new DOMXPath($doc);$nodelist = $xpath->query('//span[@class="page-numbers"]');echo 'there are ', $nodelist->length, ' span 元素具有 class="page-numbers"';

这样做

<a href="/users?page=15" title="转到第 15 页"><span class="page-numbers">15</span></a>

(倒数第二个 a 元素)始终指向最后一页，即此链接是否包含您要查找的值?
然后，您可以使用 XPath 表达式选择第二个但最后一个 a 元素，然后从那里选择它的子 span 元素.

//div[@class="pager"] <- 选择每个 其中属性类等于pager"//div[@class="pager"]/a <- 选择每个 那是寻呼机 div 的直接子级//div[@class="pager"]/a[position()=last()-1] <- 选择 这是第二但最后//div[@class="pager"]/a[position()=last()-1]/span <- 选择直接子的那一秒但最后一个<a>寻呼机  中的元素

(您可能想要获取一个好的 XPath 教程 ;-) )

$doc->loadhtml($html);$xpath = new DOMXPath($doc);$nodelist = $xpath->query('//div[@class="pager"]/a[position()=last()-1]/span');如果 ( 0 <$nodelist-> 长度 ) {echo $nodelist->item(0)->nodeValue;}别的 {echo '未找到';}

Hi I have a website's home page that I am reading in using Curl and I need to grab the number of pages that the site has.

The information is in a div:-

<div class="pager">
<span class="page-numbers current">1</span>
<a href="/users?page=2" title="go to page 2"><span class="page-numbers">2</span></a>
<a href="/users?page=3" title="go to page 3"><span class="page-numbers">3</span></a>
<a href="/users?page=4" title="go to page 4"><span class="page-numbers">4</span></a>
<a href="/users?page=5" title="go to page 5"><span class="page-numbers">5</span></a>
<span class="page-numbers dots">&hellip;</span>

<a href="/users?page=15" title="go to page 15"><span class="page-numbers">15</span></a>
<a href="/users?page=2" title="go to page 2"><span class="page-numbers next"> next</span></a>
</div>

The value I need is 15 but this could be any number depending on the site but will always be in the same position.

How could I read this value easily and assign it to a variable in PHP.

Thanks

Jonathan

解决方案

You can use PHP's DOM module for that. Read the page with DOMDocument::loadhtmlfile(), then create a DOMXPath object and query all span elements within the document having the class="page-numbers" attribute.

(edit: oops, that's not what you're looking for, see second code snippet)

$html = '<html><head><title>:::</title></head><body>
<div class="pager">
<span class="page-numbers current">1</span>
<a href="/users?page=2" title="go to page 2"><span class="page-numbers">2</span></a>
<a href="/users?page=3" title="go to page 3"><span class="page-numbers">3</span></a>
<a href="/users?page=4" title="go to page 4"><span class="page-numbers">4</span></a>
<a href="/users?page=5" title="go to page 5"><span class="page-numbers">5</span></a>
<span class="page-numbers dots">&hellip;</span>

<a href="/users?page=15" title="go to page 15"><span class="page-numbers">15</span></a>
<a href="/users?page=2" title="go to page 2"><span class="page-numbers next"> next</span></a>
</div>
</body></html>';

$doc = new DOMDocument;
// since the content "is already here" we use loadhtml(content)
// instead of loadhtmlfile(url) 
$doc->loadhtml($html);
$xpath = new DOMXPath($doc);
$nodelist = $xpath->query('//span[@class="page-numbers"]');
echo 'there are ', $nodelist->length, ' span elements having class="page-numbers"';

edit: does this

<a href="/users?page=15" title="go to page 15"><span class="page-numbers">15</span></a>

(the second last a element) always point to the last page, i.e. does this link contain the value you're looking for?
Then you can use a XPath expression that selects the second but last a element and from there its child span element.

//div[@class="pager"] <- select each <div> where the attribute class equals "pager"
//div[@class="pager"]/a <- select each <a> that is a direct child of the pager div
//div[@class="pager"]/a[position()=last()-1] <- select the <a> that is second but last
//div[@class="pager"]/a[position()=last()-1]/span <- select the direct child <span> of that second but last <a> element in the pager <div>

( you might want to fetch a good XPath tutorial ;-) )

$doc->loadhtml($html);
$xpath = new DOMXPath($doc);
$nodelist = $xpath->query('//div[@class="pager"]/a[position()=last()-1]/span');
if ( 0 < $nodelist->length ) {
  echo $nodelist->item(0)->nodeValue;
}
else {
  echo 'not found';
}

这篇关于从网页中提取价值的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

从网页中提取价值 [英] extract value from web page

问题描述

相关文章

PHP最新文章

热门教程

热门工具

登录关闭

从网页中提取价值 [英] extract value from web page

问题描述

相关文章

PHP最新文章

热门教程

热门工具

登录 关闭

登录关闭