使用安全登录来刮除网站内容的特定区域 [英] Scrape a particular area of site content With a Secure Login

查看:124
本文介绍了使用安全登录来刮除网站内容的特定区域的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图抓取一个网站的某些特定文本,登录保护
这里是使用curl的教程。
http://www.digeratimarketing.co.uk/2008/12/16/curl-page-scraping-script/ a>

I am trying to scrape some particular text of a website which is login secured here is the tutorial on this using curl http://www.digeratimarketing.co.uk/2008/12/16/curl-page-scraping-script/

但是我无法将其实现到我的curl代码
这里是我的curl脚本

But I am unable to implement this into my curl codes here is my curl script

$url = "http://aftabcurrency.com/login_script.php";

$ch = curl_init();    
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1); 

curl_setopt($ch, CURLOPT_URL, $url); 
$cookie = 'cookies.txt';
$timeout = 30;

curl_setopt ($ch, CURLOPT_FOLLOWLOCATION, 1);
curl_setopt($ch, CURLOPT_TIMEOUT,         10); 
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT,  $timeout );
curl_setopt($ch, CURLOPT_COOKIEJAR,       $cookie);
curl_setopt($ch, CURLOPT_COOKIEFILE,      $cookie);

curl_setopt ($ch, CURLOPT_POST, 1); 
curl_setopt ($ch,CURLOPT_POSTFIELDS,"user_name=user&user_password=pass&passcode=code");     

$result = curl_exec($ch); 
curl_close($ch); 
$source = $result;
if(preg_match("/(CC3300\">)(.*?)(<\/font>)/is",$source,$found)){
echo $found[2];
}else{
echo "Text not found.";
}

例如在aftabcurrency.com我只想报废我们的服务事项!(这个文本每天都在变化)

for example in aftabcurrency.com I only wish to scrap only "Our Services Matters!" (this text changes every day)

推荐答案

我会做的是剪切开始和开始之间的文本...在源文本是从文本颜色613A75开始,ands与结束< / font>标签..这里是一个regex解决方案:

what I would do is to "cut out" a text between start and beginning... in the source the text is starting by a text color 613A75 and ands with the closing < /font> tag.. here is a regex solution:

$source = file_get_contents("http://aftabcurrency.com/index.php");
if(preg_match("/(613A75\">)(.*?)(<\/font>)/is",$source,$found)){
echo $found[2];
}else{
echo "Text not found.";
}

如果您想在会员区内使用文字来做到这一点,这里到你的源代码并用$ source = $ result替换$ source = file_get_contents ...

if you want to do this with your text inside member area, add my source here to your source and replace the $source = file_get_contents... with $source = $result

还有其他方法来做这个,DomDocument和xpath或简单的strpos / strstr / substr php函数。

there is also other way to do this, DomDocument and xpath or simple strpos / strstr / substr php functions.

这篇关于使用安全登录来刮除网站内容的特定区域的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆