PHP屏幕抓取和会话 [英] PHP Screen Scraping and Sessions

查看:76
本文介绍了PHP屏幕抓取和会话的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在屏幕抓取方面还不错。

Ok still new to the screen scraping thing.

我已成功登录所需的网站,但是现在如何重定向到另一个页面?
登录后,我尝试在我需要的页面上执行另一个GET请求,但是它具有重定向,该重定向会将我带回到登录页面。

I've managed to log into the site I need but now how do I redirect to another page? After I login I'm trying to do another GET request on the page that I need but it has a redirect on it that takes me back to the login page.

所以我认为SESSION变量没有被传递,我该如何克服呢?

So I'm thinking the SESSION variables are not being passed, how can I over come this?

问题:

即使我发布了第二页URL,它仍然会将我重定向到登录页面,除非我已经登录,但是屏幕抓取代码不允许传递SESSION数据?

Even if I post the 2nd page URL it still redirects me back to the login page, unless I'm logged in already, but the screen scrape code is not allowing the SESSION data to be passed?

我从找到了这段代码另一个屏幕抓取器问题在这里@stack

class Curl {

    public $cookieJar = "";

    public function __construct($cookieJarFile = 'cookies.txt') {
        $this->cookieJar = $cookieJarFile;
    }

    function setup() {
        $header = array();
        $header[0]  = "Accept: text/xml,application/xml,application/xhtml+xml,";
        $header[0] .= "text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5";
        $header[]   = "Cache-Control: max-age=0";
        $header[]   = "Connection: keep-alive";
        $header[]   = "Keep-Alive: 300";
        $header[]   = "Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7";
        $header[]   = "Accept-Language: en-us,en;q=0.5";
        $header[]   = "Pragma: "; // browsers keep this blank.

        curl_setopt($this->curl, CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows; U; Windows NT 5.2; en-US; rv:1.8.1.7) Gecko/20070914 Firefox/2.0.0.7');
        curl_setopt($this->curl, CURLOPT_HTTPHEADER, $header);
        curl_setopt($this->curl, CURLOPT_COOKIEJAR, $cookieJar);
        curl_setopt($this->curl, CURLOPT_COOKIEFILE, $cookieJar);
        curl_setopt($this->curl, CURLOPT_AUTOREFERER, true);
        curl_setopt($this->curl, CURLOPT_FOLLOWLOCATION, true);
        curl_setopt($this->curl, CURLOPT_RETURNTRANSFER, true);
    }

    function get($url) {
        $this->curl = curl_init($url);
        $this->setup();

        return $this->request();
    }

    function getAll($reg, $str) {
        preg_match_all($reg, $str, $matches);
        return $matches[1];
    }

    function postForm($url, $fields, $referer = '') {
        $this->curl = curl_init($url);
        $this->setup();
        curl_setopt($this->curl, CURLOPT_URL, $url);
        curl_setopt($this->curl, CURLOPT_POST, 1);
        curl_setopt($this->curl, CURLOPT_REFERER, $referer);
        curl_setopt($this->curl, CURLOPT_POSTFIELDS, $fields);
        return $this->request();
    }

    function getInfo($info) {
        $info = ($info == 'lasturl') ? curl_getinfo($this->curl, CURLINFO_EFFECTIVE_URL) : curl_getinfo($this->curl, $info);
        return $info;
    }

    function request() {
        return curl_exec($this->curl);
    }
}

打电话给班上

include('/var/www/html/curl.php');
$curl = new Curl();

$url = "here.com";
$newURL = "here.com/newpage.php";

$fields = "usr=user1&pass=PassWord";

// Calling URL
$referer = "http://here.com/index.php";

$html = $curl->postForm($url, $fields, $referer);
$html = $curl->get($newURL);

echo $html; // takes me back to $url instead of $newURL


推荐答案

以下各行不使用 $ this,并且$ cookieJar不在本地范围内:

The following lines do not use "$this" and $cookieJar isn't in local scope:

curl_setopt($this->curl, CURLOPT_COOKIEJAR, $cookieJar);
curl_setopt($this->curl, CURLOPT_COOKIEFILE, $cookieJar);

所以应该看起来像这样:

So it should look like:

    curl_setopt($this->curl, CURLOPT_COOKIEJAR, $this->cookieJar);
    curl_setopt($this->curl, CURLOPT_COOKIEFILE, $this->cookieJar);

如果那样不能解决问题,请尝试只做以下文章:

If that doesn't fix the issue try and only do the post:

$ curl-> postForm($ url,$ fields,$ referer);

而不是

$ curl-> get($ newURL)

然后检查cookie.txt文件是否包含任何内容?它被创建了吗?让我们知道结果,因为在没有实际URL的情况下很难快速测试代码。

Then check if the cookie.txt file contains anything? Does it get created? Let us know the results as it's hard to quickly test your code without the actual URL being hit.

如果它没有创建cookie.txt文件,几乎可以确保在请求之间不保留会话。

If it isn't creating a cookie.txt file than you can almost guarantee that the session isn't being kept between requests.

这篇关于PHP屏幕抓取和会话的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆