屏幕刮取 [英] Screen Scraping

查看:104
本文介绍了屏幕刮取的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试在我的网站上实施屏幕抓取场景,并且到目前为止已进行了以下设置.我最终想要做的是替换$ results变量中所有具有"ResultsDetails.aspx?"的链接.改为"results-scrape-details/",然后再次输出.谁能指出我正确的方向?

Hi I'm trying to implement a screen scraping scenario on my website and have the following set so far. What I'm ultimately trying to do is replace all links in the $results variable that have "ResultsDetails.aspx?" to "results-scrape-details/" then output again. Can anyone point me in the right direction?

<?php 
$url = "http://mysite:90/Testing/label/stuff/ResultsIndex.aspx";
$raw = file_get_contents($url);
$newlines = array("\t","\n","\r","\x20\x20","\0","\x0B");
$content = str_replace($newlines, "", html_entity_decode($raw));
$start = strpos($content,"<div id='pageBack'");
$end = strpos($content,'</body>',$start) + 6;
$results = substr($content,$start,$end-$start);
$pattern = 'ResultsDetails.aspx?';
$replacement = 'results-scrape-details/';
preg_replace($pattern, $replacement, $results);
echo $results;

推荐答案

使用DOM工具,例如 PHP简单HTML DOM .有了它,您就可以使用Jqueryish语法找到要查找的所有链接.

Use a DOM tool like PHP Simple HTML DOM. With it you can find all the links you're looking for with a Jqueryish syntax.

// Create DOM object from HTML source
$dom = file_get_html('http://www.domain.com/path/to/page');
// Iterate all matching links
foreach ($dom->find('a[href^=ResultsDetails.aspx') as $node) {
    // Replace href attribute value
    $node->href = 'results-scrape-detail/';
}
// Output modified DOM
echo $dom->outertext;

这篇关于屏幕刮取的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆