刮擦密码保护的ASP页面 [英] Scrape password protected asp page

查看:84
本文介绍了刮擦密码保护的ASP页面的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想为ASP密码保护的网页开发自动抓取工具.我有此页面的登录名/密码.

I would like to develop automatic scraper for asp password protected web page. I have a login/password for this page.

首先,在通过firefox授权期间查看Firebug日志.我发现的东西:

First of all, a look in to Firebug log during authorization via firefox. What I have found:

  1. 当我打开登录页面时,我得到带有"__RequestVerificationToken"的cookie.即 http://mysite
  2. 当我按下登录按钮FF使用参数UserName,Password和__RequestVerificationToken对 http://mysite/Account/Login 进行POST查询时,它也会使用步骤1中保存的cookie
  3. 如果成功获得授权,我会得到另一个cookie .ASPXAUTH并转到 http://mysite/Account/Index (我要抓取的页面)
  1. When I open login page, I get cookie with "__RequestVerificationToken". i.e http://mysite
  2. When I press Login button FF makes POST query to http://mysite/Account/Login with parameters UserName, Password and __RequestVerificationToken, also it uses cookie saved on step 1
  3. In case of successful authorisation I get another cookie .ASPXAUTH and goes to http://mysite/Account/Index (page which I want to scrape)

我的代码

//1. Get __RequestVerificationToken cookie

    $urlLogin = "http://mysite";
    $cookieFile = "cookie.txt";
    $regs=array();
    
    $ch = curl_init();
    
    curl_setopt($ch, CURLOPT_URL, $urlLogin);
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE);
    curl_setopt($ch, CURLOPT_FOLLOWLOCATION, TRUE);
    curl_setopt($ch, CURLOPT_VERBOSE, TRUE);
    curl_setopt($ch, CURLOPT_STDERR,$f = fopen("answer.txt", "w+"));
    curl_setopt($ch, CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows NT 5.1; rv:18.0) Gecko/20100101 Firefox/18.0' );
    curl_setopt($ch, CURLOPT_COOKIEJAR, $cookieFile); 
    
    $data=curl_exec($ch);

//2. Parse token value for the post request

$hash=file_get_contents("answer.txt");
preg_match_all('/=(.*); p/i',$hash, $regs);

//3. Make a post request

    $postData = '__RequestVerificationToken='.$regs[1][0].'&UserName=someLogin'.'&Password=somePassword';
    $urlSecuredPage = "http://mysite/Account/Login";
    curl_setopt($ch, CURLOPT_URL, $urlSecuredPage); 
    curl_setopt($ch, CURLOPT_POST, TRUE);
    curl_setopt($ch, CURLOPT_POSTFIELDS, $postData);
    curl_setopt($ch, CURLOPT_COOKIEJAR, $cookieFile); 
    curl_setopt($ch, CURLOPT_COOKIEFILE, $cookieFile); 

    $data = curl_exec($ch);
    curl_close($ch);

在第3步中,我在第1步中保存的cookie用__RequestVerificationToken的新值重写.我不明白为什么会这样.结果,由于错误的__RequestVerificationToken导致我无法授权,并收到HTTP 500错误.

At step 3 my cookie saved on step 1 is rewriting with new value of __RequestVerificationToken. I don`t understand why it happens. As a result I can not authorize due to wrong __RequestVerificationToken and get HTTP 500 error.

我在哪里错了?

推荐答案

__ RequestVerificationToken应该有两件事.其中一个具有隐藏的输入值,第二个位于cookie中.来自隐藏输入值的值在每个请求中发送.对于每个请求,它都有一个新值.这取决于Cookie的值.

There are should be two things for __RequestVerificationToken. One of them in hidden input value, the second one in the cookie. Value from hidden input value is sent in each request. And for each request it has a new value. It depends on cookie value.

因此,您需要保存输入值和cookie,然后将它们一起发送回去.如果您不会从隐藏的输入中发送值,则Asp.Net MVC会认为这是一种攻击,并会生成新的cookie.仅当验证失败或cookie本身不存在时,才会生成新的cookie.如果获得该cookie,并且始终随POST请求一起发送__RequestVerificationToken输入值,则它不应生成新的cookie.

So you need to save input value and cookie, and send them back together. If you won't send value from hidden input, then Asp.Net MVC thinks that this is an attack, and generate new cookie. New cookie will be generated only if validation failed or the cookie itself doesn't exists. If you get that cookie, and you always send __RequestVerificationToken input value with POST request, then it shouldn't generate new cookie.

如果仍在生成,则表示您从隐藏的输入值中发送了不正确的__RequestVerificationToken.尝试从Fiddler \ Charles执行相同的操作,并检查是否返回成功结果.

If it's still generated, then you are sending incorrect __RequestVerificationToken from hidden input value. Try to do the same from Fiddler\Charles, and check will be return success result or not.

它们用于防止CSRF攻击.

They are used to prevent CSRF attacks.

这篇关于刮擦密码保护的ASP页面的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆