使用curl aspx页面的屏幕抓取 [英] Screen Scraping of aspx page using curl

查看:484
本文介绍了使用curl aspx页面的屏幕抓取的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我用这code,它不工作。请帮助

  $ URL =htt​​p://www.riogrande.com/Category/Findings-and-Finished-Jewelry/132/Bails-and-Enhancers/472;
$文件=的file_get_contents($网址);
preg_match(#*#错?,$文件,​​$ arr_viewstate);
$视图状态= urlen code($ arr_viewstate [1]);
$ eventvalidation = urlen code($ arr_viewstate [2]);
$选项=数组(
    CURLOPT_RETURNTRANSFER =>真的,//返回网页
    CURLOPT_HEADER =>假的,//不返回头
    CURLOPT_ENCODING => ,//处理所有编码
    CURLOPT_USERAGENT => Mozilla的/ 5.0(视窗; U; Windows NT的5.2; EN-US; rv中:1.8.1.7)的Gecko / 20070914火狐/ 2.0.0.7',//我是谁
    CURLOPT_AUTOREFERER =>在重定向真,//设置引荐
    CURLOPT_CONNECTTIMEOUT => 120,//在连接超时
    CURLOPT_TIMEOUT => 1120,//在响应超时
    CURLOPT_MAXREDIRS => 10,// 10后停止重定向
    CURLOPT_POST =>真正,
    CURLOPT_VERBOSE =>真正,
    CURLOPT_POSTFIELDS => '__EVENTTARGET='.urlen$c$c('ctl00$ContentPlaceHolderBody$SearchPageNavigationTop$rptPager$ctl01').'&__EVENTARGUMENT='.urlen$c$c('').'&__VIEWSTATE='.$viewstate.'&__EVENTVALIDATION='.$eventvalidation.'&__LASTFOCUS='.urlen$c$c('')
);$ CH = curl_init($网址);
curl_setopt_array($ CH,$选项);


解决方案

的真相是,我不明白你想要什么来实现,但我肯定知道这是不是让的方式__ VIEWSTATE __ EVENTVALIDATION

应该是这样的。

  $ URL =htt​​p://www.riogrande.com/Category/Findings-and-Finished-Jewelry/132/Bails-and-Enhancers/472;
$ = HTML的file_get_contents($网址);preg_match('〜<输入类型=隐藏的名字=__ VIEWSTATEID =__ VIEWSTATEVALUE =(*)?/>〜',$ HTML,$视图状态);
preg_match('〜<输入类型=隐藏的名字=__ EVENTVALIDATIONID =__ EVENTVALIDATIONVALUE =(*)?/>〜',$ HTML,$ eventvalidation);$视图状态= $的ViewState [1];
$ eventvalidation = $ eventvalidation [1];后续代码var_dump($视图状态,$ eventvalidation);

I am using this code, and its not working. Please help

$url = "http://www.riogrande.com/Category/Findings-and-Finished-Jewelry/132/Bails-and-Enhancers/472";
$file=file_get_contents($url);
preg_match("#.*?#mis", $file, $arr_viewstate);
$viewstate = urlencode($arr_viewstate[1]);
$eventvalidation = urlencode($arr_viewstate[2]);
$options = array(
    CURLOPT_RETURNTRANSFER => true, // return web page
    CURLOPT_HEADER => false, // don't return headers
    CURLOPT_ENCODING => "", // handle all encodings
    CURLOPT_USERAGENT => "Mozilla/5.0 (Windows; U; Windows NT 5.2; en-US; rv:1.8.1.7) Gecko/20070914 Firefox/2.0.0.7'", // who am i
    CURLOPT_AUTOREFERER => true, // set referer on redirect
    CURLOPT_CONNECTTIMEOUT => 120, // timeout on connect
    CURLOPT_TIMEOUT => 1120, // timeout on response
    CURLOPT_MAXREDIRS => 10, // stop after 10 redirects
    CURLOPT_POST => true,
    CURLOPT_VERBOSE => true,
    CURLOPT_POSTFIELDS => '__EVENTTARGET='.urlencode('ctl00$ContentPlaceHolderBody$SearchPageNavigationTop$rptPager$ctl01').'&__EVENTARGUMENT='.urlencode('').'&__VIEWSTATE='.$viewstate.'&__EVENTVALIDATION='.$eventvalidation.'&__LASTFOCUS='.urlencode('')
);

$ch = curl_init($url);
curl_setopt_array($ch,$options);

解决方案

The truth is that i don't understand what you want to achieve but i definitely know that that is not the way to get __VIEWSTATE and __EVENTVALIDATION

it should be something like this

$url = "http://www.riogrande.com/Category/Findings-and-Finished-Jewelry/132/Bails-and-Enhancers/472";
$html = file_get_contents($url);

preg_match('~<input type="hidden" name="__VIEWSTATE" id="__VIEWSTATE" value="(.*?)" />~',$html,$viewstate);
preg_match('~<input type="hidden" name="__EVENTVALIDATION" id="__EVENTVALIDATION" value="(.*?)" />~',$html,$eventvalidation);

$viewstate = $viewstate[1];
$eventvalidation = $eventvalidation[1] ;

var_dump($viewstate,$eventvalidation);

这篇关于使用curl aspx页面的屏幕抓取的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆