file_get_contents来自url,只有在登录网站后才能访问 [英] file_get_contents from url that is only accessible after log-in to website

查看:114
本文介绍了file_get_contents来自url,只有在登录网站后才能访问的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想制作一个可从网站捕获网页的PHP脚本。请考虑 file_get_contents($ url)

I would like to make a php script that can capture a page from a website. Think file_get_contents($url).

但是,本网站要求您在访问之前填写用户名/密码登录表任何页面。我想,一旦登录后,网站会向您的浏览器发送一个身份验证Cookie,并针对每个后续的浏览器请求,将会话信息传回网站以验证访问。

However, this website requires that you fill in a username/password log-in form before you can access any page. I imagine that once logged-in, the website sends your browser an authentication cookie and with every consequent browser request, the session info is passed back to the website to authenticate access.

我想知道我可以用php脚本模拟浏览器的这种行为,以便访问和捕获来自本网站的页面。

I want to know how i can simulate this behavior of the browser with a php script in order to gain access and capture a page from this website.

更具体地说,我的问题是:

More specifically, my questions are:


  1. 如何发送请求
    包含我的登录详细信息,以便
    网站回复会话
    信息/ cookie

  2. 如何阅读会话
    信息/ cookie

  3. 将此会话的
    信息返回到
    网站的
    信息( file_get_contents )。

  1. How do I send a request that contains my log-in details so that the website replies with the session information/cookie
  2. How do i read the session information/cookie
  3. How do i pass back this session information with every consequent request (file_get_contents, curl) to the website.

感谢。

推荐答案

Curl非常适合做它。除了设置CURLOPT_COOKIEJAR和CURLOPT_COOKIEFILE选项之外,您不需要做任何其他操作。通过传递网站中的表单字段登录后,Cookie将被保存,Curl将自动使用相同的Cookie,用于后续请求,如下面的示例所示。

Curl is pretty well suited to do it. You don't need to do anything special other than set the CURLOPT_COOKIEJAR and CURLOPT_COOKIEFILE options. Once you've logged in by passing the form fields from the site the cookie will be saved and Curl will use that same cookie for subsequent requests automatically as the example below illustrates.

请注意,以下函数将Cookie保存到cookies / cookie.txt,因此请确保目录/文件存在并可写入。

Note that the function below saves the cookies to 'cookies/cookie.txt' so make sure that directory/file exists and can be written to.

$loginUrl = 'http://example.com/login'; //action from the login form
$loginFields = array('username'=>'user', 'password'=>'pass'); //login form field names and values
$remotePageUrl = 'http://example.com/remotepage.html'; //url of the page you want to save  

$login = getUrl($loginUrl, 'post', $loginFields); //login to the site

$remotePage = getUrl($remotePageUrl); //get the remote page

function getUrl($url, $method='', $vars='') {
    $ch = curl_init();
    if ($method == 'post') {
        curl_setopt($ch, CURLOPT_POST, 1);
        curl_setopt($ch, CURLOPT_POSTFIELDS, $vars);
    }
    curl_setopt($ch, CURLOPT_URL, $url);
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
    curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
    curl_setopt($ch, CURLOPT_COOKIEJAR, 'cookies/cookies.txt');
    curl_setopt($ch, CURLOPT_COOKIEFILE, 'cookies/cookies.txt');
    $buffer = curl_exec($ch);
    curl_close($ch);
    return $buffer;
}

这篇关于file_get_contents来自url,只有在登录网站后才能访问的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆