来自 url 的 file_get_contents 只能在登录网站后访问 [英] file_get_contents from url that is only accessible after log-in to website

查看:29
本文介绍了来自 url 的 file_get_contents 只能在登录网站后访问的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想做一个可以从网站捕获页面的 php 脚本.想想file_get_contents($url).

I would like to make a php script that can capture a page from a website. Think file_get_contents($url).

但是,本网站要求您在访问任何页面之前填写用户名/密码登录表格.我想,一旦登录,该网站就会向您的浏览器发送一个身份验证 cookie,并且随着每个随后的浏览器请求,会话信息都会传递回网站以验证访问.

However, this website requires that you fill in a username/password log-in form before you can access any page. I imagine that once logged-in, the website sends your browser an authentication cookie and with every consequent browser request, the session info is passed back to the website to authenticate access.

我想知道如何使用 php 脚本模拟浏览器的这种行为,以便访问并从该网站捕获页面.

I want to know how i can simulate this behavior of the browser with a php script in order to gain access and capture a page from this website.

更具体地说,我的问题是:

More specifically, my questions are:

  1. 如何发送请求包含我的登录详细信息,以便网站回复会话信息/cookie
  2. 我如何阅读会话信息/cookie
  3. 我如何传回这个会话与每一个结果的信息请求 (file_get_contents, curl) 到网站.
  1. How do I send a request that contains my log-in details so that the website replies with the session information/cookie
  2. How do i read the session information/cookie
  3. How do i pass back this session information with every consequent request (file_get_contents, curl) to the website.

谢谢.

推荐答案

Curl 非常适合这样做.除了设置 CURLOPT_COOKIEJARCURLOPT_COOKIEFILE 选项之外,您不需要做任何特殊的事情.通过从站点传递表单字段登录后,cookie 将被保存,Curl 将自动为后续请求使用相同的 cookie,如下例所示.

Curl is pretty well suited to do it. You don't need to do anything special other than set the CURLOPT_COOKIEJAR and CURLOPT_COOKIEFILE options. Once you've logged in by passing the form fields from the site the cookie will be saved and Curl will use that same cookie for subsequent requests automatically as the example below illustrates.

请注意,下面的函数将 cookie 保存到 cookies/cookie.txt 中,因此请确保目录/文件存在且可以写入.

Note that the function below saves the cookies to cookies/cookie.txt so make sure that directory/file exists and can be written to.

$loginUrl = 'http://example.com/login'; //action from the login form
$loginFields = array('username'=>'user', 'password'=>'pass'); //login form field names and values
$remotePageUrl = 'http://example.com/remotepage.html'; //url of the page you want to save  

$login = getUrl($loginUrl, 'post', $loginFields); //login to the site

$remotePage = getUrl($remotePageUrl); //get the remote page

function getUrl($url, $method='', $vars='') {
    $ch = curl_init();
    if ($method == 'post') {
        curl_setopt($ch, CURLOPT_POST, 1);
        curl_setopt($ch, CURLOPT_POSTFIELDS, $vars);
    }
    curl_setopt($ch, CURLOPT_URL, $url);
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
    curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
    curl_setopt($ch, CURLOPT_COOKIEJAR, 'cookies/cookies.txt');
    curl_setopt($ch, CURLOPT_COOKIEFILE, 'cookies/cookies.txt');
    $buffer = curl_exec($ch);
    curl_close($ch);
    return $buffer;
}

这篇关于来自 url 的 file_get_contents 只能在登录网站后访问的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆