使用带有cURL的php登录网页 [英] login into webpage with php with cURL

查看:75
本文介绍了使用带有cURL的php登录网页的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我买了一本书,用php进行网络抓取。作者在其中登录



我替换了行

 'destination'=> ‘https://www.packtpub.com’


‘op’=> '登录'

,已添加

 'form_build_id'=> ''

并编辑

  $ postUrl ='https://www.packtpub.com/register'; 

因为这是我选择复制为cURL并粘贴到编辑器中时获得的URL。



我仍然收到不,某事记错了消息。我认为这是因为 $ successString 最初并没有存储在curl中。 form-b​​uild-id应该设置为什么?每次登录时,情况都会改变。

解决方案

您使用的书很旧,Packt Publishing已更改了他们的网站。现在,它包含一个CSRF令牌,如果不通过该令牌,您将永远无法登录。



我已经开发了一个可行的解决方案。它使用 pQuery 来解析HTML。您可以使用Composer进行安装,也可以下载软件包并将其包含在应用程序中。如果这样做,请删除 require __DIR__。 '/vendor/autoload.php'; 并替换为系统上 pquery 包的位置。



要通过命令行进行测试,只需运行: php packt_example.php



您还会注意到甚至不需要许多标头,例如useragent。我已将其排除在外。

 <?php 

需要__DIR__。 ‘/vendor/autoload.php’;

$ email =‘myemail@gmail.com’;
$ password =‘mypassword’;

#初始化一个cURL会话。
$ ch = curl_init(’https://www.packtpub.com/register’);

#设置cURL选项。
$ options = [
CURLOPT_COOKIEFILE => ‘cookies.txt’,
CURLOPT_COOKIEJAR => ‘cookies.txt’,
CURLOPT_RETURNTRANSFER => 1
];

#设置选项
curl_setopt_array($ ch,$ options);

#执行
$ html = curl_exec($ ch);

#从HTML源代码中获取CSRF令牌
$ dom = pQuery :: parseStr($ html);
$ csrfToken = $ dom-> query(’[name = form_build_id]’)-> val();

#现在我们有了form_build_id(又名CSRF令牌),我们可以
#继续进行POST请求登录。首先,
#让我们创建一个帖子数据数组,以与POST
#请求一起发送。
$ postData = [
电子邮件 => $ email,
密码 => $ password,
'op'=> 登录,
form_build_id => $ csrfToken,
form_id => ‘packt_user_login_form’
];


#将post数据数组转换为URL编码的字符串
$ postDataStr = http_build_query($ postData);

#将一些字段附加到CURL选项数组以发出POST请求。
$ options [CURLOPT_POST] = 1;
$ options [CURLOPT_POSTFIELDS] = $ postDataStr;
$ options [CURLOPT_HEADER] = 1;

curl_setopt_array($ ch,$ options);

#执行
$ response = curl_exec($ ch);

#从响应中提取标题
$ headerSize = curl_getinfo($ ch,CURLINFO_HEADER_SIZE);
$ headers = substr($ response,0,$ headerSize);

#关闭cURL句柄
curl_close($ ch);

#如果登录成功,标题将包含一个位置标题
#到URL http://www.packtpub.com/index
if(!strpos($标头, packtpub.com/index))
{
打印登录失败;
出口;
}

打印已登录;


I bought a book on web scraping with php. In it the author logins into https://www.packtpub.com/ . The book is out of date so I can't really test ideas out, because the webpage has changed since release. This is the modified code I am using, but the logins are unsuccessful, which I concluded from "Account Options" string not being in the $results variable. What should I change? I believe the error is coming from incorrectly specifying destination.

<?php
// Function to submit form using cURL POST method
function curlPost($postUrl, $postFields, $successString) {
    $useragent = 'Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.5;
       en-US; rv:1.9.2.3) Gecko/20100401 Firefox/3.6.3';  // Setting useragent of a popular browser
    $cookie = 'cookie.txt';  // Setting a cookie file to storecookie
    $ch = curl_init();  // Initialising cURL session
    // Setting cURL options
    curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, FALSE);  // PreventcURL from verifying SSL certificate
    curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, 0);
    curl_setopt($ch, CURLOPT_FAILONERROR, TRUE);  // Script shouldfail silently on error
    curl_setopt($ch, CURLOPT_COOKIESESSION, TRUE);  // Use cookies
    curl_setopt($ch, CURLOPT_FOLLOWLOCATION, TRUE);  // FollowLocation: headers
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE);  // Returningtransfer as a string
    curl_setopt($ch, CURLOPT_COOKIEFILE, $cookie);  // Settingcookiefile
    curl_setopt($ch, CURLOPT_COOKIEJAR, $cookie);  // Settingcookiejar
    curl_setopt($ch, CURLOPT_USERAGENT, $useragent);  // Settinguseragent
    curl_setopt($ch, CURLOPT_URL, $postUrl);  // Setting URL to POSTto
    curl_setopt($ch, CURLOPT_POST, TRUE);  // Setting method as POST
    curl_setopt($ch, CURLOPT_POSTFIELDS, http_build_query($postFields));  // Setting POST fields as array
            $results = curl_exec($ch);  // Executing cURL session
            $httpcode = curl_getinfo($ch,CURLINFO_HTTP_CODE);
                echo "$httpcode";
            curl_close($ch);  // Closing cURL session
            // Checking if login was successful by checking existence of string
            if (strpos($results, $successString)) {
                echo "I'm in.";
                return $results;
            } else {
                echo "Nope, sth went wrong.";
                return FALSE;
            } 
}

$userEmail = 'youremail@email.com';  // Setting your email address for site login
$userPass = 'yourpass';  // Setting your password for sitelogin
$postUrl = 'https://www.packtpub.com';  // Setting URL toPOST to
// Setting form input fields as 'name' => 'value'
$postFields = array(
        'email' => $userEmail,
        'password' => $userPass,
        'destination' => 'https://www.packtpub.com',
        'form_id' => 'packt-user-login-form'
);
$successString = 'Account Options';
$loggedIn = curlPost($postUrl, $postFields, $successString);  //Executing curlPost login and storing results page in $loggedIn

EDIT: post request:

I replaced the line

'destination' => 'https://www.packtpub.com'
with    

'op' => 'Login'

,added

'form_build_id' => ''

and edited

$postUrl = 'https://www.packtpub.com/register';

since that is the URL I get when choosing copy as cURL and pasting in editor.

I am still getting "Nope, sth went wrong message". I think it is because $successString doesn't get stored in curl in the first place. What is the form-build-id supposed to be set to? It is changing every time I log in.

解决方案

The book you're using is old, and Packt Publishing have changed their website. It now includes a CSRF token, and without passing this you will never be able to log in.

I've developed a working solution. It uses pQuery for parsing the HTML. You can install this using Composer, or download the package and include it into your application. If you do this, remove the require __DIR__ . '/vendor/autoload.php'; and replace with the location to the pquery package on your system.

To test via the command line simply run: php packt_example.php.

You will also notice that many headers are not even required, such as the useragent. I have left these out.

<?php

require __DIR__ . '/vendor/autoload.php';

$email = 'myemail@gmail.com';
$password = 'mypassword';

# Initialize a cURL session.
$ch = curl_init('https://www.packtpub.com/register');

# Set the cURL options.
$options = [
    CURLOPT_COOKIEFILE      => 'cookies.txt',
    CURLOPT_COOKIEJAR       => 'cookies.txt',
    CURLOPT_RETURNTRANSFER  => 1
];

# Set the options
curl_setopt_array($ch, $options);

# Execute
$html = curl_exec($ch);

# Grab the CSRF token from the HTML source
$dom = pQuery::parseStr($html);
$csrfToken = $dom->query('[name="form_build_id"]')->val();

# Now we have the form_build_id (aka the CSRF token) we can
# proceed with making the POST request to login. First,
# lets create an array of post data to send with the POST
# request.
$postData = [
    'email'         => $email,
    'password'      => $password,
    'op'            => 'Login',
    'form_build_id' => $csrfToken,
    'form_id'       => 'packt_user_login_form'
];


# Convert the post data array to URL encoded string
$postDataStr = http_build_query($postData);

# Append some fields to the CURL options array to make a POST request.
$options[CURLOPT_POST] = 1;
$options[CURLOPT_POSTFIELDS] = $postDataStr;
$options[CURLOPT_HEADER] = 1;

curl_setopt_array($ch, $options);

# Execute
$response = curl_exec($ch);

# Extract the headers from the response
$headerSize = curl_getinfo($ch, CURLINFO_HEADER_SIZE);
$headers = substr($response, 0, $headerSize);

# Close cURL handle
curl_close($ch);

# If login is successful, the headers will contain a location header
# to the url http://www.packtpub.com/index
if(!strpos($headers, 'packtpub.com/index'))
{
    print 'Login Failed';
    exit;
}

print 'Logged In';

这篇关于使用带有cURL的php登录网页的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆