POST后从页面获取HTML [英] Get HTML from page after POST

查看:215
本文介绍了POST后从页面获取HTML的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想使用Symfony2的DOMCrawler从页面中提取数据。这是我想从中获取数据的页面: http://kovv.mavari.be/kalender.aspx



但是当我点击'zoek'(下拉框中没有参数)后,我想要它,那就是我想要的页面!现在我有: $ html = file_get_contents(http://kovv.mavari.be/kalender.aspx);



但显然它会加载没有帖子的第一页。有没有办法让我可以加载一个帖子的页面?或者是否需要先将页面保存到本地驱动器?



更新:

这是我的代码:

$ $ p $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $'
'ctl00 $ ContentPlaceHolder1 $ ddlReeks'=>'',
'ctl00_ContentPlaceHolder1_ddlDatum'=>''
));

$ options = array('http'=> array(
'method'=>'POST',
'header'=>'Content-type: application / x-www-form-urlencoded',
'content'=> $ post
));

$ context = stream_context_create($ options);
$ html = file_get_contents('http://kovv.mavari.be/kalender.aspx',false,$ context);

但是html仍然没有改变,它仍然是没有发布的第一页。



更新2:这就是我现在的情况:

  $ url =http://kovv.mavari.be/kalender.aspx; 
$ regs = array();

$ cookies ='../src/VolleyScout/VolleyScoutBundle/Resources/doc/cookie.txt';

//正则表达式为__VIEWSTATE和__EVENTVALIDATION解析出特殊的ASP.NET
//值
$ regexViewstate ='/ __ VIEWSTATE \value = \( 。*)\/ i';
$ regexEventVal ='/ __ EVENTVALIDATION \value = \(。*)\/ i';

$ ch = curl_init();

curl_setopt($ ch,CURLOPT_URL,$ url);
curl_setopt($ ch,CURLOPT_RETURNTRANSFER,TRUE);
curl_setopt($ ch,CURLOPT_FOLLOWLOCATION,TRUE);
curl_setopt($ ch,CURLOPT_SSL_VERIFYPEER,FALSE);
$ data = curl_exec($ ch);

$ viewstate = $ this-> regexExtract($ data,$ regexViewstate,$ regs,1);
$ eventval = $ this-> regexExtract($ data,$ regexEventVal,$ regs,1);

$ postData ='__VIEWSTATE ='。rawurlencode($ viewstate)
。'& __ EVENTVALIDATION ='。rawurlencode($ eventval)
。'& ctl00_ContentPlaceHolder1_ddlGeslacht = Heren'
。'& ctl00 $ ContentPlaceHolder1 $ ddlReeks'
。'& ctl00_ContentPlaceHolder1_ddlDatum'
。'& ctl00 $ ContentPlaceHolder1 $ btnZoek:zoek'
;

curl_setOpt($ ch,CURLOPT_POST,TRUE);
curl_setopt($ ch,CURLOPT_POSTFIELDS,$ postData);
curl_setopt($ ch,CURLOPT_URL,$ url);
curl_setopt($ ch,CURLOPT_COOKIEJAR,$ cookies);

curl_setOpt($ ch,CURLOPT_POST,FALSE);
curl_setopt($ ch,CURLOPT_URL,$ url);
curl_setopt($ ch,CURLOPT_COOKIEFILE,$ cookies);

$ data = curl_exec($ ch);

echo $ data;

curl_close($ ch);

但我仍然没有发布帖子,我错过了什么?

解决方案

您必须使用 file_get_contents 并传递

  $ post = http_build_query(array($ b $) b'ctl00 $ ContentPlaceHolder1 $ ddlGeslacht'=>'...',
'ctl00 $ ContentPlaceHolder1 $ ddlReeks'=>'...',
// ...
));

$ options = array('http'=> array(
'method'=>'POST',
'header'=>'Content-type: application / x-www-form-urlencoded',
'content'=> $ post
));

$ context = stream_context_create($ options);
file_get_contents('http://kovv.mavari.be/kalender.aspx',false,$ context);


I want to extract data from a page with the DOMCrawler of Symfony2. This is the page where I want to get data from: http://kovv.mavari.be/kalender.aspx

But I want it after a post, when you click on 'zoek' (no parameters in dropdowns), that's the page I want! Now I have : $html = file_get_contents("http://kovv.mavari.be/kalender.aspx");

But obviously it will load the first page without a post. Is there a way that I can load the page with a post? or do I need to save the page to my local drive first?

UPDATE:
This is my code now:

$post = http_build_query(array(
    'ctl00$ContentPlaceHolder1$ddlGeslacht' => 'Heren',
    'ctl00$ContentPlaceHolder1$ddlReeks' => '',
    'ctl00_ContentPlaceHolder1_ddlDatum' => ''
));

$options= array('http' => array(
    'method'  => 'POST',
    'header'  => 'Content-type: application/x-www-form-urlencoded',
    'content' => $post
));

$context  = stream_context_create($options);
$html = file_get_contents('http://kovv.mavari.be/kalender.aspx', false, $context);

But the html is still not changed, it's still the first page without post..

UPDATE 2: This is what I have now:

$url = "http://kovv.mavari.be/kalender.aspx";
$regs=array();

$cookies = '../src/VolleyScout/VolleyScoutBundle/Resources/doc/cookie.txt';

// regular expressions to parse out the special ASP.NET
// values for __VIEWSTATE and __EVENTVALIDATION
$regexViewstate = '/__VIEWSTATE\" value=\"(.*)\"/i';
$regexEventVal  = '/__EVENTVALIDATION\" value=\"(.*)\"/i';

$ch = curl_init();

curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, TRUE);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, FALSE);
$data=curl_exec($ch);

$viewstate = $this->regexExtract($data,$regexViewstate,$regs,1);
$eventval = $this->regexExtract($data, $regexEventVal,$regs,1);

$postData = '__VIEWSTATE='.rawurlencode($viewstate)
    .'&__EVENTVALIDATION='.rawurlencode($eventval)
    .'&ctl00_ContentPlaceHolder1_ddlGeslacht=Heren'
    .'&ctl00$ContentPlaceHolder1$ddlReeks'
    .'&ctl00_ContentPlaceHolder1_ddlDatum'
    .'&ctl00$ContentPlaceHolder1$btnZoek:zoek'
;

curl_setOpt($ch, CURLOPT_POST, TRUE);
curl_setopt($ch, CURLOPT_POSTFIELDS, $postData);
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_COOKIEJAR, $cookies);

curl_setOpt($ch, CURLOPT_POST, FALSE);
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_COOKIEFILE, $cookies);

$data = curl_exec($ch);

echo $data;

curl_close($ch);

But I still get the page without a post, am I missing something?

解决方案

You have to use the context param of file_get_contents and pass an stream context object to send an post request.

$post = http_build_query(array(
    'ctl00$ContentPlaceHolder1$ddlGeslacht' => '...',
    'ctl00$ContentPlaceHolder1$ddlReeks' => '...',
    // ...
));

$options= array('http' => array(
    'method'  => 'POST',
    'header'  => 'Content-type: application/x-www-form-urlencoded',
    'content' => $post
));

$context  = stream_context_create($options);
file_get_contents('http://kovv.mavari.be/kalender.aspx', false, $context);

这篇关于POST后从页面获取HTML的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆