使用curl aspx页面的屏幕抓取 [英] Screen Scraping of aspx page using curl
本文介绍了使用curl aspx页面的屏幕抓取的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我用这code,它不工作。请帮助
$ URL =http://www.riogrande.com/Category/Findings-and-Finished-Jewelry/132/Bails-and-Enhancers/472;
$文件=的file_get_contents($网址);
preg_match(#*#错?,$文件,$ arr_viewstate);
$视图状态= urlen code($ arr_viewstate [1]);
$ eventvalidation = urlen code($ arr_viewstate [2]);
$选项=数组(
CURLOPT_RETURNTRANSFER =>真的,//返回网页
CURLOPT_HEADER =>假的,//不返回头
CURLOPT_ENCODING => ,//处理所有编码
CURLOPT_USERAGENT => Mozilla的/ 5.0(视窗; U; Windows NT的5.2; EN-US; rv中:1.8.1.7)的Gecko / 20070914火狐/ 2.0.0.7',//我是谁
CURLOPT_AUTOREFERER =>在重定向真,//设置引荐
CURLOPT_CONNECTTIMEOUT => 120,//在连接超时
CURLOPT_TIMEOUT => 1120,//在响应超时
CURLOPT_MAXREDIRS => 10,// 10后停止重定向
CURLOPT_POST =>真正,
CURLOPT_VERBOSE =>真正,
CURLOPT_POSTFIELDS => '__EVENTTARGET='.urlen$c$c('ctl00$ContentPlaceHolderBody$SearchPageNavigationTop$rptPager$ctl01').'&__EVENTARGUMENT='.urlen$c$c('').'&__VIEWSTATE='.$viewstate.'&__EVENTVALIDATION='.$eventvalidation.'&__LASTFOCUS='.urlen$c$c('')
);$ CH = curl_init($网址);
curl_setopt_array($ CH,$选项);
解决方案
的真相是,我不明白你想要什么来实现,但我肯定知道这是不是让的方式__ VIEWSTATE
和 __ EVENTVALIDATION
应该是这样的。
$ URL =http://www.riogrande.com/Category/Findings-and-Finished-Jewelry/132/Bails-and-Enhancers/472;
$ = HTML的file_get_contents($网址);preg_match('〜<输入类型=隐藏的名字=__ VIEWSTATEID =__ VIEWSTATEVALUE =(*)?/>〜',$ HTML,$视图状态);
preg_match('〜<输入类型=隐藏的名字=__ EVENTVALIDATIONID =__ EVENTVALIDATIONVALUE =(*)?/>〜',$ HTML,$ eventvalidation);$视图状态= $的ViewState [1];
$ eventvalidation = $ eventvalidation [1];后续代码var_dump($视图状态,$ eventvalidation);
I am using this code, and its not working. Please help
$url = "http://www.riogrande.com/Category/Findings-and-Finished-Jewelry/132/Bails-and-Enhancers/472";
$file=file_get_contents($url);
preg_match("#.*?#mis", $file, $arr_viewstate);
$viewstate = urlencode($arr_viewstate[1]);
$eventvalidation = urlencode($arr_viewstate[2]);
$options = array(
CURLOPT_RETURNTRANSFER => true, // return web page
CURLOPT_HEADER => false, // don't return headers
CURLOPT_ENCODING => "", // handle all encodings
CURLOPT_USERAGENT => "Mozilla/5.0 (Windows; U; Windows NT 5.2; en-US; rv:1.8.1.7) Gecko/20070914 Firefox/2.0.0.7'", // who am i
CURLOPT_AUTOREFERER => true, // set referer on redirect
CURLOPT_CONNECTTIMEOUT => 120, // timeout on connect
CURLOPT_TIMEOUT => 1120, // timeout on response
CURLOPT_MAXREDIRS => 10, // stop after 10 redirects
CURLOPT_POST => true,
CURLOPT_VERBOSE => true,
CURLOPT_POSTFIELDS => '__EVENTTARGET='.urlencode('ctl00$ContentPlaceHolderBody$SearchPageNavigationTop$rptPager$ctl01').'&__EVENTARGUMENT='.urlencode('').'&__VIEWSTATE='.$viewstate.'&__EVENTVALIDATION='.$eventvalidation.'&__LASTFOCUS='.urlencode('')
);
$ch = curl_init($url);
curl_setopt_array($ch,$options);
解决方案
The truth is that i don't understand what you want to achieve but i definitely know that that is not the way to get __VIEWSTATE
and __EVENTVALIDATION
it should be something like this
$url = "http://www.riogrande.com/Category/Findings-and-Finished-Jewelry/132/Bails-and-Enhancers/472";
$html = file_get_contents($url);
preg_match('~<input type="hidden" name="__VIEWSTATE" id="__VIEWSTATE" value="(.*?)" />~',$html,$viewstate);
preg_match('~<input type="hidden" name="__EVENTVALIDATION" id="__EVENTVALIDATION" value="(.*?)" />~',$html,$eventvalidation);
$viewstate = $viewstate[1];
$eventvalidation = $eventvalidation[1] ;
var_dump($viewstate,$eventvalidation);
这篇关于使用curl aspx页面的屏幕抓取的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文