POST后从页面获取HTML [英] Get HTML from page after POST
问题描述
我想使用Symfony2的DOMCrawler从页面中提取数据。这是我想从中获取数据的页面: http://kovv.mavari.be/kalender.aspx
但是当我点击'zoek'(下拉框中没有参数)后,我想要它,那就是我想要的页面!现在我有: $ html = file_get_contents(http://kovv.mavari.be/kalender.aspx);
但显然它会加载没有帖子的第一页。有没有办法让我可以加载一个帖子的页面?或者是否需要先将页面保存到本地驱动器?
更新:
这是我的代码:
$ $ p $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $'
'ctl00 $ ContentPlaceHolder1 $ ddlReeks'=>'',
'ctl00_ContentPlaceHolder1_ddlDatum'=>''
));
$ options = array('http'=> array(
'method'=>'POST',
'header'=>'Content-type: application / x-www-form-urlencoded',
'content'=> $ post
));
$ context = stream_context_create($ options);
$ html = file_get_contents('http://kovv.mavari.be/kalender.aspx',false,$ context);
但是html仍然没有改变,它仍然是没有发布的第一页。
更新2:这就是我现在的情况:
$ url =http://kovv.mavari.be/kalender.aspx;
$ regs = array();
$ cookies ='../src/VolleyScout/VolleyScoutBundle/Resources/doc/cookie.txt';
//正则表达式为__VIEWSTATE和__EVENTVALIDATION解析出特殊的ASP.NET
//值
$ regexViewstate ='/ __ VIEWSTATE \value = \( 。*)\/ i';
$ regexEventVal ='/ __ EVENTVALIDATION \value = \(。*)\/ i';
$ ch = curl_init();
curl_setopt($ ch,CURLOPT_URL,$ url);
curl_setopt($ ch,CURLOPT_RETURNTRANSFER,TRUE);
curl_setopt($ ch,CURLOPT_FOLLOWLOCATION,TRUE);
curl_setopt($ ch,CURLOPT_SSL_VERIFYPEER,FALSE);
$ data = curl_exec($ ch);
$ viewstate = $ this-> regexExtract($ data,$ regexViewstate,$ regs,1);
$ eventval = $ this-> regexExtract($ data,$ regexEventVal,$ regs,1);
$ postData ='__VIEWSTATE ='。rawurlencode($ viewstate)
。'& __ EVENTVALIDATION ='。rawurlencode($ eventval)
。'& ctl00_ContentPlaceHolder1_ddlGeslacht = Heren'
。'& ctl00 $ ContentPlaceHolder1 $ ddlReeks'
。'& ctl00_ContentPlaceHolder1_ddlDatum'
。'& ctl00 $ ContentPlaceHolder1 $ btnZoek:zoek'
;
curl_setOpt($ ch,CURLOPT_POST,TRUE);
curl_setopt($ ch,CURLOPT_POSTFIELDS,$ postData);
curl_setopt($ ch,CURLOPT_URL,$ url);
curl_setopt($ ch,CURLOPT_COOKIEJAR,$ cookies);
curl_setOpt($ ch,CURLOPT_POST,FALSE);
curl_setopt($ ch,CURLOPT_URL,$ url);
curl_setopt($ ch,CURLOPT_COOKIEFILE,$ cookies);
$ data = curl_exec($ ch);
echo $ data;
curl_close($ ch);
但我仍然没有发布帖子,我错过了什么?
您必须使用 file_get_contents 并传递 I want to extract data from a page with the DOMCrawler of Symfony2. This is the page where I want to get data from: http://kovv.mavari.be/kalender.aspx But I want it after a post, when you click on 'zoek' (no parameters in dropdowns), that's the page I want! Now I have : But obviously it will load the first page without a post. Is there a way that I can load the page with a post? or do I need to save the page to my local drive first? UPDATE: But the html is still not changed, it's still the first page without post.. UPDATE 2: This is what I have now: But I still get the page without a post, am I missing something? You have to use the context param of file_get_contents and pass an stream context object to send an post request.
这篇关于POST后从页面获取HTML的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
$ post = http_build_query(array($ b $) b'ctl00 $ ContentPlaceHolder1 $ ddlGeslacht'=>'...',
'ctl00 $ ContentPlaceHolder1 $ ddlReeks'=>'...',
// ...
));
$ options = array('http'=> array(
'method'=>'POST',
'header'=>'Content-type: application / x-www-form-urlencoded',
'content'=> $ post
));
$ context = stream_context_create($ options);
file_get_contents('http://kovv.mavari.be/kalender.aspx',false,$ context);
$html = file_get_contents("http://kovv.mavari.be/kalender.aspx");
This is my code now:$post = http_build_query(array(
'ctl00$ContentPlaceHolder1$ddlGeslacht' => 'Heren',
'ctl00$ContentPlaceHolder1$ddlReeks' => '',
'ctl00_ContentPlaceHolder1_ddlDatum' => ''
));
$options= array('http' => array(
'method' => 'POST',
'header' => 'Content-type: application/x-www-form-urlencoded',
'content' => $post
));
$context = stream_context_create($options);
$html = file_get_contents('http://kovv.mavari.be/kalender.aspx', false, $context);
$url = "http://kovv.mavari.be/kalender.aspx";
$regs=array();
$cookies = '../src/VolleyScout/VolleyScoutBundle/Resources/doc/cookie.txt';
// regular expressions to parse out the special ASP.NET
// values for __VIEWSTATE and __EVENTVALIDATION
$regexViewstate = '/__VIEWSTATE\" value=\"(.*)\"/i';
$regexEventVal = '/__EVENTVALIDATION\" value=\"(.*)\"/i';
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, TRUE);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, FALSE);
$data=curl_exec($ch);
$viewstate = $this->regexExtract($data,$regexViewstate,$regs,1);
$eventval = $this->regexExtract($data, $regexEventVal,$regs,1);
$postData = '__VIEWSTATE='.rawurlencode($viewstate)
.'&__EVENTVALIDATION='.rawurlencode($eventval)
.'&ctl00_ContentPlaceHolder1_ddlGeslacht=Heren'
.'&ctl00$ContentPlaceHolder1$ddlReeks'
.'&ctl00_ContentPlaceHolder1_ddlDatum'
.'&ctl00$ContentPlaceHolder1$btnZoek:zoek'
;
curl_setOpt($ch, CURLOPT_POST, TRUE);
curl_setopt($ch, CURLOPT_POSTFIELDS, $postData);
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_COOKIEJAR, $cookies);
curl_setOpt($ch, CURLOPT_POST, FALSE);
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_COOKIEFILE, $cookies);
$data = curl_exec($ch);
echo $data;
curl_close($ch);
$post = http_build_query(array(
'ctl00$ContentPlaceHolder1$ddlGeslacht' => '...',
'ctl00$ContentPlaceHolder1$ddlReeks' => '...',
// ...
));
$options= array('http' => array(
'method' => 'POST',
'header' => 'Content-type: application/x-www-form-urlencoded',
'content' => $post
));
$context = stream_context_create($options);
file_get_contents('http://kovv.mavari.be/kalender.aspx', false, $context);