使用带有cURL的php登录网页 [英] login into webpage with php with cURL
问题描述
我买了一本书,用php进行网络抓取。作者在其中登录
我替换了行
'destination'=> ‘https://www.packtpub.com’
和
‘op’=> '登录'
,已添加
'form_build_id'=> ''
并编辑
$ postUrl ='https://www.packtpub.com/register';
因为这是我选择复制为cURL并粘贴到编辑器中时获得的URL。
我仍然收到不,某事记错了消息。我认为这是因为 $ successString
最初并没有存储在curl中。 form-build-id应该设置为什么?每次登录时,情况都会改变。
您使用的书很旧,Packt Publishing已更改了他们的网站。现在,它包含一个CSRF令牌,如果不通过该令牌,您将永远无法登录。
我已经开发了一个可行的解决方案。它使用 pQuery 来解析HTML。您可以使用Composer进行安装,也可以下载软件包并将其包含在应用程序中。如果这样做,请删除 require __DIR__。 '/vendor/autoload.php';
并替换为系统上 pquery
包的位置。
要通过命令行进行测试,只需运行: php packt_example.php
。
您还会注意到甚至不需要许多标头,例如useragent。我已将其排除在外。
<?php
需要__DIR__。 ‘/vendor/autoload.php’;
$ email =‘myemail@gmail.com’;
$ password =‘mypassword’;
#初始化一个cURL会话。
$ ch = curl_init(’https://www.packtpub.com/register’);
#设置cURL选项。
$ options = [
CURLOPT_COOKIEFILE => ‘cookies.txt’,
CURLOPT_COOKIEJAR => ‘cookies.txt’,
CURLOPT_RETURNTRANSFER => 1
];
#设置选项
curl_setopt_array($ ch,$ options);
#执行
$ html = curl_exec($ ch);
#从HTML源代码中获取CSRF令牌
$ dom = pQuery :: parseStr($ html);
$ csrfToken = $ dom-> query(’[name = form_build_id]’)-> val();
#现在我们有了form_build_id(又名CSRF令牌),我们可以
#继续进行POST请求登录。首先,
#让我们创建一个帖子数据数组,以与POST
#请求一起发送。
$ postData = [
电子邮件 => $ email,
密码 => $ password,
'op'=> 登录,
form_build_id => $ csrfToken,
form_id => ‘packt_user_login_form’
];
#将post数据数组转换为URL编码的字符串
$ postDataStr = http_build_query($ postData);
#将一些字段附加到CURL选项数组以发出POST请求。
$ options [CURLOPT_POST] = 1;
$ options [CURLOPT_POSTFIELDS] = $ postDataStr;
$ options [CURLOPT_HEADER] = 1;
curl_setopt_array($ ch,$ options);
#执行
$ response = curl_exec($ ch);
#从响应中提取标题
$ headerSize = curl_getinfo($ ch,CURLINFO_HEADER_SIZE);
$ headers = substr($ response,0,$ headerSize);
#关闭cURL句柄
curl_close($ ch);
#如果登录成功,标题将包含一个位置标题
#到URL http://www.packtpub.com/index
if(!strpos($标头, packtpub.com/index))
{
打印登录失败;
出口;
}
打印已登录;
I bought a book on web scraping with php. In it the author logins into https://www.packtpub.com/ . The book is out of date so I can't really test ideas out, because the webpage has changed since release. This is the modified code I am using, but the logins are unsuccessful, which I concluded from "Account Options" string not being in the $results
variable. What should I change? I believe the error is coming from incorrectly specifying destination.
<?php
// Function to submit form using cURL POST method
function curlPost($postUrl, $postFields, $successString) {
$useragent = 'Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.5;
en-US; rv:1.9.2.3) Gecko/20100401 Firefox/3.6.3'; // Setting useragent of a popular browser
$cookie = 'cookie.txt'; // Setting a cookie file to storecookie
$ch = curl_init(); // Initialising cURL session
// Setting cURL options
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, FALSE); // PreventcURL from verifying SSL certificate
curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, 0);
curl_setopt($ch, CURLOPT_FAILONERROR, TRUE); // Script shouldfail silently on error
curl_setopt($ch, CURLOPT_COOKIESESSION, TRUE); // Use cookies
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, TRUE); // FollowLocation: headers
curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE); // Returningtransfer as a string
curl_setopt($ch, CURLOPT_COOKIEFILE, $cookie); // Settingcookiefile
curl_setopt($ch, CURLOPT_COOKIEJAR, $cookie); // Settingcookiejar
curl_setopt($ch, CURLOPT_USERAGENT, $useragent); // Settinguseragent
curl_setopt($ch, CURLOPT_URL, $postUrl); // Setting URL to POSTto
curl_setopt($ch, CURLOPT_POST, TRUE); // Setting method as POST
curl_setopt($ch, CURLOPT_POSTFIELDS, http_build_query($postFields)); // Setting POST fields as array
$results = curl_exec($ch); // Executing cURL session
$httpcode = curl_getinfo($ch,CURLINFO_HTTP_CODE);
echo "$httpcode";
curl_close($ch); // Closing cURL session
// Checking if login was successful by checking existence of string
if (strpos($results, $successString)) {
echo "I'm in.";
return $results;
} else {
echo "Nope, sth went wrong.";
return FALSE;
}
}
$userEmail = 'youremail@email.com'; // Setting your email address for site login
$userPass = 'yourpass'; // Setting your password for sitelogin
$postUrl = 'https://www.packtpub.com'; // Setting URL toPOST to
// Setting form input fields as 'name' => 'value'
$postFields = array(
'email' => $userEmail,
'password' => $userPass,
'destination' => 'https://www.packtpub.com',
'form_id' => 'packt-user-login-form'
);
$successString = 'Account Options';
$loggedIn = curlPost($postUrl, $postFields, $successString); //Executing curlPost login and storing results page in $loggedIn
EDIT: post request:
I replaced the line
'destination' => 'https://www.packtpub.com'
with
'op' => 'Login'
,added
'form_build_id' => ''
and edited
$postUrl = 'https://www.packtpub.com/register';
since that is the URL I get when choosing copy as cURL and pasting in editor.
I am still getting "Nope, sth went wrong message". I think it is because $successString
doesn't get stored in curl in the first place. What is the form-build-id supposed to be set to? It is changing every time I log in.
The book you're using is old, and Packt Publishing have changed their website. It now includes a CSRF token, and without passing this you will never be able to log in.
I've developed a working solution. It uses pQuery for parsing the HTML. You can install this using Composer, or download the package and include it into your application. If you do this, remove the require __DIR__ . '/vendor/autoload.php';
and replace with the location to the pquery
package on your system.
To test via the command line simply run: php packt_example.php
.
You will also notice that many headers are not even required, such as the useragent. I have left these out.
<?php
require __DIR__ . '/vendor/autoload.php';
$email = 'myemail@gmail.com';
$password = 'mypassword';
# Initialize a cURL session.
$ch = curl_init('https://www.packtpub.com/register');
# Set the cURL options.
$options = [
CURLOPT_COOKIEFILE => 'cookies.txt',
CURLOPT_COOKIEJAR => 'cookies.txt',
CURLOPT_RETURNTRANSFER => 1
];
# Set the options
curl_setopt_array($ch, $options);
# Execute
$html = curl_exec($ch);
# Grab the CSRF token from the HTML source
$dom = pQuery::parseStr($html);
$csrfToken = $dom->query('[name="form_build_id"]')->val();
# Now we have the form_build_id (aka the CSRF token) we can
# proceed with making the POST request to login. First,
# lets create an array of post data to send with the POST
# request.
$postData = [
'email' => $email,
'password' => $password,
'op' => 'Login',
'form_build_id' => $csrfToken,
'form_id' => 'packt_user_login_form'
];
# Convert the post data array to URL encoded string
$postDataStr = http_build_query($postData);
# Append some fields to the CURL options array to make a POST request.
$options[CURLOPT_POST] = 1;
$options[CURLOPT_POSTFIELDS] = $postDataStr;
$options[CURLOPT_HEADER] = 1;
curl_setopt_array($ch, $options);
# Execute
$response = curl_exec($ch);
# Extract the headers from the response
$headerSize = curl_getinfo($ch, CURLINFO_HEADER_SIZE);
$headers = substr($response, 0, $headerSize);
# Close cURL handle
curl_close($ch);
# If login is successful, the headers will contain a location header
# to the url http://www.packtpub.com/index
if(!strpos($headers, 'packtpub.com/index'))
{
print 'Login Failed';
exit;
}
print 'Logged In';
这篇关于使用带有cURL的php登录网页的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!