除了CURLOPT_COOKIEFILE之外,如何使用PHP curl发送Cookie? [英] How can I send cookies using PHP curl in addition to CURLOPT_COOKIEFILE?

查看:312
本文介绍了除了CURLOPT_COOKIEFILE之外,如何使用PHP curl发送Cookie?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在提交表单后从网站中提取一些内容。问题是,脚本失败,每次,说说2次5的脚本失败。我使用php curl,COOKIEFILE和COOKIEJAR来处理cookie。但是,当我观察到我的浏览器发送的标题(当从我的浏览器访问目标网站和使用实时http头)和php发送的头,看到有很多的区别。

I am scraping some content from a website after a form submission. The problem is that the script is failing every now and then, say 2 times out of 5 the script fails. I am using php curl, COOKIEFILE and COOKIEJAR to handle the cookie. However when I observed the sent headers of my browser (when visiting the target website from my browser and using live http headers) and the headers sent by php and saw there are many differences.

我的浏览器发送的Cookie变数多于PHP curl。我认为这个差异可能是因为javascript是设置大多数cookie的反对,但我不知道这个。

My browser sent a lot more cookie variables than php curl. I think this difference might be because javascript is resposible for setting most of the cookies, however I'm not sure about this.

我使用下面的代码来做抓取和我显示我的浏览器和php卷发发送的标题:

I am using the below code to do the scraping and I am showing the sent headers of my browser and of php curl:

$ckfile = tempnam ("/tmp", 'cookiename');

$url = 'https://www.domain.com/firststep';
$poststring = 'variable1=4&variable2=5';
$ch = curl_init ($url);
curl_setopt ($ch, CURLOPT_COOKIEJAR, $ckfile);
curl_setopt ($ch, CURLOPT_COOKIEFILE, $ckfile);
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, 30);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
curl_setopt ($ch, CURLOPT_POST, 1);
curl_setopt ($ch, CURLOPT_POSTFIELDS, $poststring);
$output = curl_exec ($ch);
curl_close($ch);



$url = 'https://www.domain.com/nextstep';
$poststring = 'variableB1=4&variableB2=5';
$ch = curl_init ($url);
curl_setopt ($ch, CURLOPT_COOKIEJAR, $ckfile);
curl_setopt ($ch, CURLOPT_COOKIEFILE, $ckfile);
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, 30);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
curl_setopt ($ch, CURLOPT_POST, 1);
curl_setopt ($ch, CURLOPT_POSTFIELDS, $poststring);
curl_setopt($ch, CURLINFO_HEADER_OUT, true);
$output = curl_exec ($ch);
$headers = curl_getinfo($ch, CURLINFO_HEADER_OUT);
curl_close($ch);

print_r($headers);

// Gives:
POST /d-cobs-web/doffers.html;jsessionid=7BC2A5277A4EB07D9A7237A707BE1366 HTTP/1.1
User-Agent: Mozilla
Host: domain.subdomain.nl
Accept: */*
Cookie: JSESSIONID=7BC2A5277A4EB07D9A7237A707BE1366; www-20480=MIFBNLFDFAAA
Content-Length: 187
Content-Type: application/x-www-form-urlencoded

// Where live http headers gives:
POST /d-cobs-web/doffers.html;jsessionid=7BC2A5277A4EB07D9A7237A707BE1366 HTTP/1.1
Host: domain.subdomain.nl
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:21.0) Gecko/20100101 Firefox/21.0
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language: nl,en-us;q=0.7,en;q=0.3
Accept-Encoding: gzip, deflate
Content-Type: application/x-www-form-urlencoded; charset=UTF-8
Referer: https://domain.subdomain.nl/dd/doffers.html?returnUrl=https%3A%2F%2Fttcc.subdomain.nl%2Fdd%2Fpreferences.html%3FValueChanged%3Dfalse&BEGBA=&departureDate=13-06-2013&extChangeTime=&pax2=0&bp=&pax1=1&pax4=0&bk=&pax3=0&shopId=&xtpage=&partner=NSINT&bc=&xt_pc=&ov=&departureTime=&comfortClass=2&destination=DEBHF&thalysTicketless=&beneUser=&debugDOffer=&logonId=&valueChanged=&iDomesticOrigin=&rp=&returnTime=&locale=nl_NL&vu=&thePassWeekend=false&returnDate=&xtsite=&pax=A&lc2=&lc1=&lc4=&lc3=&lc6=&lc5=&BECRA=&passType2=&custId=&lc9=&iDomesticDestination=&passType1=A&lc7=&lc8=&origin=NLASC&toporef=&pid=&passType4=&returnTimeType=1&passType3=&departureTimeType=1&socusId=&idr3=&xtn2=&loyaltyCard=&idr2=&idr1=&thePassBusiness=false&cid=14812
Content-Length: 219
Cookie: subdomainPARTNER=NSINT; JSESSIONID=CB3FEB3AC72AD61A80BFED91D3FD96CA; www-20480=MHFBNLFDFAAA; campaignPos=5; www-47873=MGFBNLFDFAAA; __utma=1.993399624.1370027094.1370040145.1370082133.5; __utmc=1; __utmz=1.1370027094.1.1.utmcsr=(direct)|utmccn=(direct)|utmcmd=(none); BCSessionID=5dc05787-c2c8-43e1-9abe-93989970b087; BCPermissionLevel=PERSONAL; __utmb=1.1.10.1370082133
Connection: keep-alive
Pragma: no-cache
Cache-Control: no-cache
AJAXREQUEST=_viewRoot&doffersForm=doffersForm&doffersForm%3AvalueChanged=&doffersForm%3ArequestValid=true&javax.faces.ViewState=j_id3&doffersForm%3Aj_id937=doffersForm%3Aj_id937&valueChanged=false&AJAX%3AEVENTS_COUNT=1&

我想使用:

$headers   = array();
$headers[] = 'Cookie: ' . $cookie;

和:

curl_setopt($ch, CURLOPT_HTTPHEADER, $headers);

其中:

$cookie = 'subdomainPARTNER=NSINT; JSESSIONID=CB3FEB3AC72AD61A80BFED91D3FD96CA; www-20480=MHFBNLFDFAAA; campaignPos=5; www-47873=MGFBNLFDFAAA; __utma=1.993399624.1370027094.1370040145.1370082133.5; __utmc=1; __utmz=1.1370027094.1.1.utmcsr=(direct)|utmccn=(direct)|utmcmd=(none); BCSessionID=5dc05787-c2c8-43e1-9abe-93989970b087; BCPermissionLevel=PERSONAL; __utmb=1.1.10.1370082133';

上面的cookie中的一些参数可能会从网站的内容中刮除,但不是全部。其中一些我可能能够从$ ckfile中读取,但我不知道如何做。特别是utma utmc,utmz,utmcsr,utmccn,utmcmd我不能从任何地方得到,我认为这些都是由javascript生成的。

Some of the parameters in the cookie above I might be able to scrape from the content of the website, but not all. Some of them I might be able to read from the $ckfile, but I don't know how to do that. Especially the utma utmc, utmz, utmcsr, utmccn, utmcmd I am not able to get from anywhere, I think these are generated by the javascript.

问题1:
我在当前代码中处理cookie时发生错误,因为很少发送cookie变量php curl和更多的浏览器?此外:浏览器和php curl发送的标题之间的其他差异是否会返回正确的内容的问题?

Question 1: Am I doing something wrong with the cookie handling in the current code as very few cookie variables are sent by php curl and a lot more by the browser? Further: can other differences between sent headers by browser and php curl be a problem to return the right content?

问题2:
缺少的cookie变量是否由于javascript设置这些cookie?

Question 2: Are the missing cookie variables due to the javascript setting those cookies?

问题3:
什么是最好的处理方式

Question 3: What is the best way to handle the cookies to make sure that all required cookies are being sent to the remote server?

您的帮助是非常受欢迎的!

Your help is very welcome!

推荐答案

如果cookie是从脚本生成的,那么您可以从文件中手动发送cookie(使用cookie文件选项)。例如:

If the cookie is generated from script, then you can send the cookie manually along with the cookie from the file(using cookie-file option). For example:

# sending manually set cookie
curl_setopt($ch, CURLOPT_HTTPHEADER, array("Cookie: test=cookie"));

# sending cookies from file
curl_setopt($ch, CURLOPT_COOKIEFILE, $ckfile);

在这种情况下,curl会将您定义的cookie与文件中的cookie一起发送。

In this case curl will send your defined cookie along with the cookies from the file.

如果cookie是通过javascrript生成的,那么你必须跟踪它如何生成,然后你可以使用上面的方法(通过http头)发送它。

If the cookie is generated through javascrript, then you have to trace it out how its generated and then you can send it using the above method(through http-header).

当从Mozilla发送Cookie时,会看到 utma utmc,utmz 。你不应该再担心这些东西了。

The utma utmc, utmz are seen when cookies are sent from Mozilla. You shouldn't bet worry about these things anymore.

最后,你做的方式是好的。只是确保你使用文件名的绝对路径(即 /var/dir/cookie.txt )而不是相对的。

Finally, the way you are doing is alright. Just make sure you are using absolute path for the file names(i.e. /var/dir/cookie.txt) instead of relative one.

使用curl时始终启用详细模式。它将帮助你很多跟踪请求。

Always enable the verbose mode when working with curl. It will help you a lot on tracing the requests. Also it will save lot of your times.

curl_setopt($ch, CURLOPT_VERBOSE, true);

这篇关于除了CURLOPT_COOKIEFILE之外,如何使用PHP curl发送Cookie?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆