PHP:我需要在此cURL脚本中使用cookie吗? [英] PHP: Do I need to use cookies in this cURL script?

查看:85
本文介绍了PHP:我需要在此cURL脚本中使用cookie吗?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

以下脚本:

<?php
$sDataFile = '<path>\journal-issue-ToC.htm';
$sURL = 'https://onlinelibrary.wiley.com/toc/14678624/2014/85/1';
$bHeader = false;
$sCAinfo = '<path>\cacert.pem';

$cURLhandle = curl_init();
$FilePointer = fopen($sDataFile, 'wb');

curl_setopt($cURLhandle, CURLOPT_URL, $sURL);
curl_setopt($cURLhandle, CURLOPT_FILE, $FilePointer);
curl_setopt($cURLhandle, CURLOPT_HEADER, $bHeader);
curl_setopt($cURLhandle, CURLOPT_CAINFO, $sCAinfo);

curl_exec($cURLhandle);

curl_close($cURLhandle);
fclose($FilePointer);

保存文件 journal-issue-ToC.htm,仅包含以下一行:

saves the file "journal-issue-ToC.htm" containing only the following one line:

The URL has moved <a href="https://onlinelibrary.wiley.com/toc/14678624/2014/85/1?cookieSet=1">here</a>

如果我在浏览器中打开此文件,则显示 URL已移到此处,并且单词这里链接到后缀为?cookieSet = 1的所需URL。如果单击该链接,它将带我到我要使用cURL保存的页面。

If I open this file in a browser, it says "The URL has moved here", with the word "here" linked to the desired URL suffixed with "?cookieSet=1". If I click on that link, it takes me to the page I am attempting to save with cURL.

我以为我可以模拟后缀后缀的链接带有?cookieSet = 1并再次调用 cURL_exec()的URL。因此,我在脚本中添加了三行代码:

I thought that maybe I could simulate clicking on that link by suffixing the URL with "?cookieSet=1" and calling cURL_exec() a second time. So I added three lines to the script to do that:

<?php
$sDataFile = '<path>\journal-issue-ToC-2.htm';
$sURL = 'https://onlinelibrary.wiley.com/toc/14678624/2014/85/1';
$bHeader = false;
$sCAinfo = '<path>\cacert.pem';

$cURLhandle = curl_init();
$FilePointer = fopen($sDataFile, 'wb');

curl_setopt($cURLhandle, CURLOPT_URL, $sURL);
curl_setopt($cURLhandle, CURLOPT_FILE, $FilePointer);
curl_setopt($cURLhandle, CURLOPT_HEADER, $bHeader);
curl_setopt($cURLhandle, CURLOPT_CAINFO, $sCAinfo);

curl_exec($cURLhandle);

$sURL .= '?cookieSet=1';
curl_setopt($cURLhandle, CURLOPT_URL, $sURL);
curl_exec($cURLhandle);

curl_close($cURLhandle);
fclose($FilePointer);

此脚本保存文件 journal-issue-ToC-2.htm,其中仅包含以下两个行:

This script saves the file "journal-issue-ToC-2.htm" containing only the following two lines:

The URL has moved <a href="https://onlinelibrary.wiley.com/toc/14678624/2014/85/1?cookieSet=1">here</a>
The URL has moved <a href="http://onlinelibrary.wiley.com/action/cookieAbsent">here</a>

如果我在浏览器中打开此文件,则会显示两次 URL已移到此处,并带有第一个单词 here链接到所需的URL,后缀如前,第二个单词 here链接到无用页面 http://onlinelibrary.wiley.com/action/cookieAbsent

If I open this file in a browser, it says "The URL has moved here" twice, with the first word "here" linked to the desired URL suffixed as before and the second word "here" linked to the useless page "http://onlinelibrary.wiley.com/action/cookieAbsent".

我用Google搜索 php curl URL已移到这里。。大多数结果都是使用外语编写的,没有任何迹象表明这种现象的原因或如何克服这种现象,从而无法真正检索所需的页面。

I Googled php curl "The URL has moved here". Most of the results were in foreign languages and none gave any hint of the cause of this behavior or how to get past it to actually retrieving the desired page.

我想知道是否问题是我需要对 curl_setopt()中的cookie进行处理。我以前没有使用过Cookie,并且一直在 curl_setopt()中阅读有关Cookie的选项,感到有些困惑。有人可以解释这些脚本中发生了什么以及我需要进行哪些更改才能使脚本正常工作?

I wonder if the problem is that I need to do something with cookies in curl_setopt(). I haven't worked with cookies before and I've been reading about the options for them in curl_setopt() and feel a bit lost. Can someone explain what's going on in these scripts and what I need to change to get the scripts to work?

我正在Windows下的IIS 7.5上运行PHP 7.2.2 7 64位。

I'm running PHP 7.2.2 on IIS 7.5 under Windows 7 64 bit.

推荐答案


我是否需要在此cURL脚本中使用cookie?

Do I need to use cookies in this cURL script?

您必须设置 curl 来存储/更新网站收到的cookie,并在每次请求时将它们发送回去。

You have to setup curl to store/update cookies received by the website and send them back upon each request.

此外,该网站仅在发送Cookie时才提供内容,您必须发出两个请求。第一个只是让cookie被获取和存储。第二个(将发送回存储的cookie)将获取实际内容。

Furthermore as the site will serve content only when cookies are sent back you have to issue two requests. The first one will just let the cookies be get and stored. The second one (that will send back the cookies stored) will get the actual content.

为了存储接收到的cookie并根据每个请求将其发送,您需要以下几行:

In order to store cookies received and send them upon each request you need these lines:

curl_setopt($cURLhandle, CURLOPT_COOKIEFILE, "path_to\cookies.txt");
curl_setopt($cURLhandle, CURLOPT_COOKIEJAR,  "path_to\cookies.txt");

path_to\cookies.txt 是本地存储cookie的文件的绝对路径。
该文件是在第一次调用时创建的。当然,目标目录必须是可读/可写的。

path_to\cookies.txt is the absolute path to the file that stores the cookies locally. The file is created upon the first call. Of course the target directory must be readable/writeable.

最后执行两次curl调用:

Finally do two curl calls:

1)只是加载主页 https://onlinelibrary.wiley.com/

2)加载所需页面 https://onlinelibrary.wiley.com/toc/14678624/2014/85/1

请注意,如果要获取多个页面,则仅在第一次时需要执行 1

Note that if you're going to fetch several pages you need step 1 only the first time.

这篇关于PHP:我需要在此cURL脚本中使用cookie吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆