授权与curl和解析使用简单的html dom不工作 [英] Authorize with curl and parse using simple html dom not working
问题描述
我试图使用简单的html dom读取一个html页面,需要登录授权。
例如: http://example.com/login/ 是登录页面, http://example.com/page/ 是我应解析的位置html。
所以我使用curl做登录和简单的html dom来解析。
不知道我的页面登录或没有,因为当我显示来自curl的响应登录页面内容!!
我搜索通过堆栈几乎所有相关问题为许多
<?php
$ curlPost ['username'] =username;
$ curlPost ['password'] =pass;
$ curlPost ['token'] =xxxxxxxxxx;
$ ch = curl_init();
curl_setopt($ ch,CURLOPT_URL,http://example.com/login/);
curl_setopt($ ch,CURLOPT_USERAGENT,Mozilla / 5.0(Windows; U; Windows NT 5.1; en-US)AppleWebKit / 525.13(KHTML,类似Gecko)Chrome / 0.A.B.C Safari / 525.13
curl_setopt($ ch,CURLOPT_POST,1);
curl_setopt($ ch,CURLOPT_POSTFIELDS,$ curlPost);
curl_setopt($ ch,CURLOPT_HEADER,true);
curl_setopt($ ch,CURLOPT_RETURNTRANSFER,true);
curl_setopt($ ch,CURLOPT_FOLLOWLOCATION,true);
curl_setopt($ ch,CURLOPT_COOKIEFILE,cookies.txt);
curl_setopt($ ch,CURLOPT_COOKIEJAR,cookies.txt);
$ response = curl_exec($ ch);
curl_close($ ch);
检索html页面的代码
$ ch = curl_init();
curl_setopt($ ch,CURLOPT_URL,http://example.com/page/);
curl_setopt($ ch,CURLOPT_USERAGENT,Mozilla / 5.0(Windows; U; Windows NT 5.1; en-US)AppleWebKit / 525.13(KHTML,类似Gecko)Chrome / 0.A.B.C Safari / 525.13
curl_setopt($ ch,CURLOPT_HEADER,true);
curl_setopt($ ch,CURLOPT_RETURNTRANSFER,1);
curl_setopt($ ch,CURLOPT_FOLLOWLOCATION,1);
curl_setopt($ ch,CURLOPT_COOKIEFILE,cookies.txt);
curl_setopt($ ch,CURLOPT_COOKIEJAR,cookies.txt);
$ reponse = curl_exec($ ch);
curl_close($ ch);
echo $ response;
?>
下面是我在页面顶部得到的回应:
HTTP / 1.1 302找到
日期:Wed,28 Jan 2015 06:59:44 GMT
服务器:Apache
X-Powered-By:PHP / 5.3.3
缓存控制:无缓存
位置:/ login
传输编码:分块
Content-Type:text / html; charset = UTF-8
HTTP / 1.1 200 OK
日期:Wed,28 Jan 2015 06:59:45 GMT
服务器:Apache
X-Powered-By:PHP / 5.3.3
缓存控制:no-cache
传输编码:chunked
Content-Type:text / html; charset = UTF-8
后面是登录页面的html内容。
任何人都可以告诉我我在做什么错误。
我在我的本地主机中运行此目标,目标位于服务器。
我没有看到cookies.txt文件发生任何变化。
非常感谢。
这看起来像正常的输出给我。如果您不想使用标题,请不要设置 CURLOPT_HEADER
I'm trying to read a html page using simple html dom for which an login authorization is needed.
for example: http://example.com/login/ is the login page and http://example.com/page/ is where i should parse the html.
So i used curl to do the login and simple html dom to parse.
But i dont know whether my page login or not, because when i display the response from curl its the login page contents!!
I searched through stack in allmost all related questions for many hours but i couldnt find what is going wrong.
below is my code
<?php
$curlPost['username']="username";
$curlPost['password']="pass";
$curlPost['token']="xxxxxxxxxx";
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL , "http://example.com/login/");
curl_setopt($ch, CURLOPT_USERAGENT, "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US) AppleWebKit/525.13 (KHTML, like Gecko) Chrome/0.A.B.C Safari/525.13");
curl_setopt($ch, CURLOPT_POST, 1);
curl_setopt($ch, CURLOPT_POSTFIELDS, $curlPost);
curl_setopt($ch, CURLOPT_HEADER, true);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($ch, CURLOPT_COOKIEFILE, "cookies.txt");
curl_setopt($ch, CURLOPT_COOKIEJAR, "cookies.txt");
$response= curl_exec ($ch);
curl_close($ch);
And the code to retrieve the html page
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL , "http://example.com/page/");
curl_setopt($ch, CURLOPT_USERAGENT, "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US) AppleWebKit/525.13 (KHTML, like Gecko) Chrome/0.A.B.C Safari/525.13");
curl_setopt($ch, CURLOPT_HEADER, true);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
curl_setopt($ch, CURLOPT_COOKIEFILE, "cookies.txt");
curl_setopt($ch, CURLOPT_COOKIEJAR, "cookies.txt");
$reponse= curl_exec ($ch);
curl_close($ch);
echo $response;
?>
Below is what i get in response in the top of my page:
HTTP/1.1 302 Found
Date: Wed, 28 Jan 2015 06:59:44 GMT
Server: Apache
X-Powered-By: PHP/5.3.3
Cache-Control: no-cache
Location: /login
Transfer-Encoding: chunked
Content-Type: text/html; charset=UTF-8
HTTP/1.1 200 OK
Date: Wed, 28 Jan 2015 06:59:45 GMT
Server: Apache
X-Powered-By: PHP/5.3.3
Cache-Control: no-cache
Transfer-Encoding: chunked
Content-Type: text/html; charset=UTF-8
followed by the login page html contents.
Anyone can advise me on what i'm doing wrong.
I'm running this in my localhost with the destination hosted in server.
And I didn't see any changes happening to "cookies.txt" file.
Many thanks.
That looks like normal output to me. If you don't want the headers, don't set CURLOPT_HEADER
这篇关于授权与curl和解析使用简单的html dom不工作的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!