除了 CURLOPT_COOKIEFILE 之外,如何使用 PHP curl 发送 cookie? [英] How can I send cookies using PHP curl in addition to CURLOPT_COOKIEFILE?

查看:30
本文介绍了除了 CURLOPT_COOKIEFILE 之外,如何使用 PHP curl 发送 cookie?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在提交表单后从网站上抓取了一些内容.问题是脚本时不时地失败,假设脚本失败 5 次中有 2 次失败.我正在使用 php curl、COOKIEFILE 和 COOKIEJAR 来处理 cookie.但是,当我观察浏览器发送的标头(从浏览器访问目标网站并使用实时 http 标头时)和 php 发送的标头时,发现有很多差异.

I am scraping some content from a website after a form submission. The problem is that the script is failing every now and then, say 2 times out of 5 the script fails. I am using php curl, COOKIEFILE and COOKIEJAR to handle the cookie. However when I observed the sent headers of my browser (when visiting the target website from my browser and using live http headers) and the headers sent by php and saw there are many differences.

我的浏览器发送的 cookie 变量比 php curl 多得多.我认为这种差异可能是因为 javascript 负责设置大部分 cookie,但我不确定这一点.

My browser sent a lot more cookie variables than php curl. I think this difference might be because javascript is resposible for setting most of the cookies, however I'm not sure about this.

我正在使用以下代码进行抓取,并显示我的浏览器和 php curl 发送的标头:

I am using the below code to do the scraping and I am showing the sent headers of my browser and of php curl:

$ckfile = tempnam ("/tmp", 'cookiename');

$url = 'https://www.domain.com/firststep';
$poststring = 'variable1=4&variable2=5';
$ch = curl_init ($url);
curl_setopt ($ch, CURLOPT_COOKIEJAR, $ckfile);
curl_setopt ($ch, CURLOPT_COOKIEFILE, $ckfile);
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, 30);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
curl_setopt ($ch, CURLOPT_POST, 1);
curl_setopt ($ch, CURLOPT_POSTFIELDS, $poststring);
$output = curl_exec ($ch);
curl_close($ch);



$url = 'https://www.domain.com/nextstep';
$poststring = 'variableB1=4&variableB2=5';
$ch = curl_init ($url);
curl_setopt ($ch, CURLOPT_COOKIEJAR, $ckfile);
curl_setopt ($ch, CURLOPT_COOKIEFILE, $ckfile);
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, 30);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
curl_setopt ($ch, CURLOPT_POST, 1);
curl_setopt ($ch, CURLOPT_POSTFIELDS, $poststring);
curl_setopt($ch, CURLINFO_HEADER_OUT, true);
$output = curl_exec ($ch);
$headers = curl_getinfo($ch, CURLINFO_HEADER_OUT);
curl_close($ch);

print_r($headers);

// Gives:
POST /d-cobs-web/doffers.html;jsessionid=7BC2A5277A4EB07D9A7237A707BE1366 HTTP/1.1
User-Agent: Mozilla
Host: domain.subdomain.nl
Accept: */*
Cookie: JSESSIONID=7BC2A5277A4EB07D9A7237A707BE1366; www-20480=MIFBNLFDFAAA
Content-Length: 187
Content-Type: application/x-www-form-urlencoded

// Where live http headers gives:
POST /d-cobs-web/doffers.html;jsessionid=7BC2A5277A4EB07D9A7237A707BE1366 HTTP/1.1
Host: domain.subdomain.nl
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:21.0) Gecko/20100101 Firefox/21.0
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language: nl,en-us;q=0.7,en;q=0.3
Accept-Encoding: gzip, deflate
Content-Type: application/x-www-form-urlencoded; charset=UTF-8
Referer: https://domain.subdomain.nl/dd/doffers.html?returnUrl=https%3A%2F%2Fttcc.subdomain.nl%2Fdd%2Fpreferences.html%3FValueChanged%3Dfalse&BEGBA=&departureDate=13-06-2013&extChangeTime=&pax2=0&bp=&pax1=1&pax4=0&bk=&pax3=0&shopId=&xtpage=&partner=NSINT&bc=&xt_pc=&ov=&departureTime=&comfortClass=2&destination=DEBHF&thalysTicketless=&beneUser=&debugDOffer=&logonId=&valueChanged=&iDomesticOrigin=&rp=&returnTime=&locale=nl_NL&vu=&thePassWeekend=false&returnDate=&xtsite=&pax=A&lc2=&lc1=&lc4=&lc3=&lc6=&lc5=&BECRA=&passType2=&custId=&lc9=&iDomesticDestination=&passType1=A&lc7=&lc8=&origin=NLASC&toporef=&pid=&passType4=&returnTimeType=1&passType3=&departureTimeType=1&socusId=&idr3=&xtn2=&loyaltyCard=&idr2=&idr1=&thePassBusiness=false&cid=14812
Content-Length: 219
Cookie: subdomainPARTNER=NSINT; JSESSIONID=CB3FEB3AC72AD61A80BFED91D3FD96CA; www-20480=MHFBNLFDFAAA; campaignPos=5; www-47873=MGFBNLFDFAAA; __utma=1.993399624.1370027094.1370040145.1370082133.5; __utmc=1; __utmz=1.1370027094.1.1.utmcsr=(direct)|utmccn=(direct)|utmcmd=(none); BCSessionID=5dc05787-c2c8-43e1-9abe-93989970b087; BCPermissionLevel=PERSONAL; __utmb=1.1.10.1370082133
Connection: keep-alive
Pragma: no-cache
Cache-Control: no-cache
AJAXREQUEST=_viewRoot&doffersForm=doffersForm&doffersForm%3AvalueChanged=&doffersForm%3ArequestValid=true&javax.faces.ViewState=j_id3&doffersForm%3Aj_id937=doffersForm%3Aj_id937&valueChanged=false&AJAX%3AEVENTS_COUNT=1&

我想使用:

$headers   = array();
$headers[] = 'Cookie: ' . $cookie;

和:

curl_setopt($ch, CURLOPT_HTTPHEADER, $headers);

哪里:

$cookie = 'subdomainPARTNER=NSINT; JSESSIONID=CB3FEB3AC72AD61A80BFED91D3FD96CA; www-20480=MHFBNLFDFAAA; campaignPos=5; www-47873=MGFBNLFDFAAA; __utma=1.993399624.1370027094.1370040145.1370082133.5; __utmc=1; __utmz=1.1370027094.1.1.utmcsr=(direct)|utmccn=(direct)|utmcmd=(none); BCSessionID=5dc05787-c2c8-43e1-9abe-93989970b087; BCPermissionLevel=PERSONAL; __utmb=1.1.10.1370082133';

上面 cookie 中的一些参数我可能能够从网站的内容中抓取,但不是全部.其中一些我可能能够从 $ckfile 中读取,但我不知道如何做到这一点.特别是 utma utmc、utmz、utmcsr、utmccn、utmcmd 我无法从任何地方获得,我认为这些是由 javascript 生成的.

Some of the parameters in the cookie above I might be able to scrape from the content of the website, but not all. Some of them I might be able to read from the $ckfile, but I don't know how to do that. Especially the utma utmc, utmz, utmcsr, utmccn, utmcmd I am not able to get from anywhere, I think these are generated by the javascript.

问题 1:我是否对当前代码中的 cookie 处理做错了什么,因为 php curl 发送的 cookie 变量很少,而浏览器发送的 cookie 变量很多?进一步:浏览器和 php curl 发送的标头之间的其他差异是否会成为返回正确内容的问题?

Question 1: Am I doing something wrong with the cookie handling in the current code as very few cookie variables are sent by php curl and a lot more by the browser? Further: can other differences between sent headers by browser and php curl be a problem to return the right content?

问题 2:是否由于 javascript 设置了这些 cookie 而丢失了 cookie 变量?

Question 2: Are the missing cookie variables due to the javascript setting those cookies?

问题 3:处理 cookie 以确保将所有必需的 cookie 发送到远程服务器的最佳方法是什么?

Question 3: What is the best way to handle the cookies to make sure that all required cookies are being sent to the remote server?

非常欢迎您的帮助!

推荐答案

如果 cookie 是从脚本生成的,那么您可以手动将 cookie 与来自文件的 cookie 一起发送(使用 cookie-file 选项).例如:

If the cookie is generated from script, then you can send the cookie manually along with the cookie from the file(using cookie-file option). For example:

# sending manually set cookie
curl_setopt($ch, CURLOPT_HTTPHEADER, array("Cookie: test=cookie"));

# sending cookies from file
curl_setopt($ch, CURLOPT_COOKIEFILE, $ckfile);

在这种情况下,curl 会将您定义的 cookie 与文件中的 cookie 一起发送.

In this case curl will send your defined cookie along with the cookies from the file.

如果cookie是通过javascrript生成的,那么你必须追踪它是如何生成的,然后你可以使用上面的方法(通过http-header)发送它.

If the cookie is generated through javascrript, then you have to trace it out how its generated and then you can send it using the above method(through http-header).

当从 Mozilla 发送 cookie 时会看到 utma utmc, utmz.你不应该再担心这些事情了.

The utma utmc, utmz are seen when cookies are sent from Mozilla. You shouldn't bet worry about these things anymore.

最后,你的做法没问题.只要确保您使用的是文件名的绝对路径(即 /var/dir/cookie.txt)而不是相对路径.

Finally, the way you are doing is alright. Just make sure you are using absolute path for the file names(i.e. /var/dir/cookie.txt) instead of relative one.

在使用 curl 时始终启用详细模式.它将对您跟踪请求有很大帮助.它还可以节省您的大量时间.

Always enable the verbose mode when working with curl. It will help you a lot on tracing the requests. Also it will save lot of your times.

curl_setopt($ch, CURLOPT_VERBOSE, true);

这篇关于除了 CURLOPT_COOKIEFILE 之外,如何使用 PHP curl 发送 cookie?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆