授权与curl和解析使用简单的html dom不工作 [英] Authorize with curl and parse using simple html dom not working

查看:103
本文介绍了授权与curl和解析使用简单的html dom不工作的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图使用简单的html dom读取一个html页面,需要登录授权。



例如: http://example.com/login/ 是登录页面, http://example.com/page/ 是我应解析的位置html。



所以我使用curl做登录和简单的html dom来解析。



不知道我的页面登录或没有,因为当我显示来自curl的响应登录页面内容!!



我搜索通过堆栈几乎所有相关问题为许多

 <?php 
$ curlPost ['username'] =username;
$ curlPost ['password'] =pass;
$ curlPost ['token'] =xxxxxxxxxx;

$ ch = curl_init();
curl_setopt($ ch,CURLOPT_URL,http://example.com/login/);
curl_setopt($ ch,CURLOPT_USERAGENT,Mozilla / 5.0(Windows; U; Windows NT 5.1; en-US)AppleWebKit / 525.13(KHTML,类似Gecko)Chrome / 0.A.B.C Safari / 525.13
curl_setopt($ ch,CURLOPT_POST,1);
curl_setopt($ ch,CURLOPT_POSTFIELDS,$ curlPost);
curl_setopt($ ch,CURLOPT_HEADER,true);
curl_setopt($ ch,CURLOPT_RETURNTRANSFER,true);
curl_setopt($ ch,CURLOPT_FOLLOWLOCATION,true);
curl_setopt($ ch,CURLOPT_COOKIEFILE,cookies.txt);
curl_setopt($ ch,CURLOPT_COOKIEJAR,cookies.txt);
$ response = curl_exec($ ch);
curl_close($ ch);

检索html页面的代码

  $ ch = curl_init(); 
curl_setopt($ ch,CURLOPT_URL,http://example.com/page/);
curl_setopt($ ch,CURLOPT_USERAGENT,Mozilla / 5.0(Windows; U; Windows NT 5.1; en-US)AppleWebKit / 525.13(KHTML,类似Gecko)Chrome / 0.A.B.C Safari / 525.13
curl_setopt($ ch,CURLOPT_HEADER,true);
curl_setopt($ ch,CURLOPT_RETURNTRANSFER,1);
curl_setopt($ ch,CURLOPT_FOLLOWLOCATION,1);
curl_setopt($ ch,CURLOPT_COOKIEFILE,cookies.txt);
curl_setopt($ ch,CURLOPT_COOKIEJAR,cookies.txt);
$ reponse = curl_exec($ ch);
curl_close($ ch);

echo $ response;
?>

下面是我在页面顶部得到的回应:


HTTP / 1.1 302找到

日期:Wed,28 Jan 2015 06:59:44 GMT

服务器:Apache

X-Powered-By:PHP / 5.3.3

缓存控制:无缓存

位置:/ login

传输编码:分块

Content-Type:text / html; charset = UTF-8



HTTP / 1.1 200 OK

日期:Wed,28 Jan 2015 06:59:45 GMT

服务器:Apache

X-Powered-By:PHP / 5.3.3

缓存控制:no-cache

传输编码:chunked

Content-Type:text / html; charset = UTF-8


后面是登录页面的html内容。



任何人都可以告诉我我在做什么错误。



我在我的本地主机中运行此目标,目标位于服务器。



我没有看到cookies.txt文件发生任何变化。



非常感谢。

解决方案

这看起来像正常的输出给我。如果您不想使用标题,请不要设置 CURLOPT_HEADER


I'm trying to read a html page using simple html dom for which an login authorization is needed.

for example: http://example.com/login/ is the login page and http://example.com/page/ is where i should parse the html.

So i used curl to do the login and simple html dom to parse.

But i dont know whether my page login or not, because when i display the response from curl its the login page contents!!

I searched through stack in allmost all related questions for many hours but i couldnt find what is going wrong.

below is my code

<?php
$curlPost['username']="username";
$curlPost['password']="pass";
$curlPost['token']="xxxxxxxxxx";

$ch = curl_init();
curl_setopt($ch, CURLOPT_URL , "http://example.com/login/");
curl_setopt($ch, CURLOPT_USERAGENT, "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US) AppleWebKit/525.13 (KHTML, like Gecko) Chrome/0.A.B.C Safari/525.13");
curl_setopt($ch, CURLOPT_POST, 1);
curl_setopt($ch, CURLOPT_POSTFIELDS, $curlPost);
curl_setopt($ch, CURLOPT_HEADER, true);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true); 
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($ch, CURLOPT_COOKIEFILE, "cookies.txt");
curl_setopt($ch, CURLOPT_COOKIEJAR, "cookies.txt");
$response= curl_exec ($ch);
curl_close($ch);

And the code to retrieve the html page

$ch = curl_init();
curl_setopt($ch, CURLOPT_URL , "http://example.com/page/");
curl_setopt($ch, CURLOPT_USERAGENT, "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US) AppleWebKit/525.13 (KHTML, like Gecko) Chrome/0.A.B.C Safari/525.13");
curl_setopt($ch, CURLOPT_HEADER, true);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1); 
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
curl_setopt($ch, CURLOPT_COOKIEFILE, "cookies.txt");
curl_setopt($ch, CURLOPT_COOKIEJAR, "cookies.txt");
$reponse= curl_exec ($ch);
curl_close($ch);

echo $response;
?>

Below is what i get in response in the top of my page:

HTTP/1.1 302 Found
Date: Wed, 28 Jan 2015 06:59:44 GMT
Server: Apache
X-Powered-By: PHP/5.3.3
Cache-Control: no-cache
Location: /login
Transfer-Encoding: chunked
Content-Type: text/html; charset=UTF-8

HTTP/1.1 200 OK
Date: Wed, 28 Jan 2015 06:59:45 GMT
Server: Apache
X-Powered-By: PHP/5.3.3
Cache-Control: no-cache
Transfer-Encoding: chunked
Content-Type: text/html; charset=UTF-8

followed by the login page html contents.

Anyone can advise me on what i'm doing wrong.

I'm running this in my localhost with the destination hosted in server.

And I didn't see any changes happening to "cookies.txt" file.

Many thanks.

解决方案

That looks like normal output to me. If you don't want the headers, don't set CURLOPT_HEADER

这篇关于授权与curl和解析使用简单的html dom不工作的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆