无害的搜寻器如何绕过WebForms身份验证并劫持用户的会话? [英] How did harmless crawler bypass WebForms authentication, and hijack a user's session?

查看:85
本文介绍了无害的搜寻器如何绕过WebForms身份验证并劫持用户的会话?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

昨晚,一位客户叫来疯狂,因为Google缓存了私人员工信息的版本.除非您登录,否则该信息不可用.

Last night a customer called, frantic, because Google had cached versions of private employee information. The information is not available unless you login.

他们已经对自己的域进行了Google搜索,例如:

They had done a Google search for their domain, e.g.:

site:example.com

,并注意到Googled已抓取并缓存了一些内部页面.

and noticed that Googled had crawled, and cached, some internal pages.

我自己查看页面的缓存版本:

Looking at the cached versions of the pages myself:

这是 https://example.com/(F(NSvQJ0SS3gYRJB4UUcDa1z7JWp7Qy7Kb76XGu8riAA1idys-nfR1mid8Qw7sZH0DYcL64GGiB6FK_TLBy3yr0KnARauyjjDL3Wdf1QcS-ivVwWrq-htW_qIeViQlz6CHtm0faD8qVOmAzdArbgngDfMMSg_N4u45UysZxTnL3d6mCX7pe2Ezj0F21g4w9VP57ZlXQ_6Rf-HhK8kMBxEdtlrEm2gBwBhOCcf_f71GdkI1))/ViewTransaction.aspx?transactionNumber=12345 .这是该页面于2013年9月15日00:07:22 GMT出现的快照

This is Google's cache of https://example.com/(F(NSvQJ0SS3gYRJB4UUcDa1z7JWp7Qy7Kb76XGu8riAA1idys-nfR1mid8Qw7sZH0DYcL64GGiB6FK_TLBy3yr0KnARauyjjDL3Wdf1QcS-ivVwWrq-htW_qIeViQlz6CHtm0faD8qVOmAzdArbgngDfMMSg_N4u45UysZxTnL3d6mCX7pe2Ezj0F21g4w9VP57ZlXQ_6Rf-HhK8kMBxEdtlrEm2gBwBhOCcf_f71GdkI1))/ViewTransaction.aspx?transactionNumber=12345. It is a snapshot of the page as it appeared on 15 Sep 2013 00:07:22 GMT

我对长网址感到困惑.而不是:

I was confused by the long url. Rather than:

https://example.com/ViewTransaction.aspx?transactionNumber=12345

插入了一个长字符串:

https://example.com/[...snip...]/ViewTransaction.aspx?transactionNumber=12345

花了我几分钟的时间来记住:这可能是ASP.net的无cookie会话"的症状.如果您的浏览器不支持 Set-Cookie ,则该网站将在URL中嵌入cookie.

It took me a few minutes to remember: that might be a symptom of ASP.net's "cookie-less sessions". If your browser does not support Set-Cookie, the web-site will embed a cookie in the URL.

除了我们的网站不使用它.

Except our site doesn't use that.

即使我们的网站 did 具有自动检测到的无cookie会话,并且Google设法使网络服务器哄骗它在url中传递会话,它如何接管另一个用户的会议?

And even if our site did have cookie-less sessions auto-detected, and Google managed to cajole the web-server into handing it a session in the url, how did it take over another user's session?

该网站已被漫游器爬网多年.而过去的5月29日也是如此.

The site has been crawled by bots for years. And this past May 29 was no different.

Google通常会通过检查robots.txt文件(我们没有文件)来开始抓取.但是没有任何人未经未经身份验证即不允许在网站上准备任何东西(包括robots.txt),因此会失败:

Google usually starts its crawl by checking the robots.txt file (we don't have one). But nobody is allowed to ready anything on the site (including robots.txt) without first being authenticated, so it fails:

Time      Uri                      Port  User Name         Status
========  =======================  ====  ================  ======
1:33:04   GET /robots.txt          80                      302    ;not authenticated, see /Account/Login.aspx
1:33:04   GET /Account/Login.aspx  80                      302    ;use https plesae
1:33:04   GET /Account/Login.aspx  443                     200    ;go ahead, try to login

一直以来,Google一直在寻找robots.txt文件.它从来没有得到过.然后返回以尝试搜寻根:

All that time Google was looking for a robots.txt file. It never got one. Then it returns to try to crawl the root:

Time      Uri                      Port  User Name         Status
========  =======================  ====  ================  ======
1:33:04   GET /                    80                      302    ;not authenticated, see /Account/Login.aspx
1:33:04   GET /Account/Login.aspx  80                      302    ;use https plesae
1:33:04   GET /Account/Login.aspx  443                     200    ;go ahead, try to login

在安全网站上再次检查robots.txt:

And another check of robots.txt on the secure site:

Time      Uri                      Port  User Name         Status
========  =======================  ====  ================  ======
1:33:04   GET /robots.txt          443                     302    ;not authenticated, see /Account/Login.aspx
1:33:04   GET /Account/Login.aspx  443                     200    ;go ahead, try to login

然后在登录页面上显示样式表:

And then the stylesheet on the login page:

Time      Uri                      Port  User Name         Status
========  =======================  ====  ================  ======
1:33:04   GET /Styles/Site.css     443                     200    

这就是GoogleBot,msnbot和BingBot每次爬网的工作方式.机器人,登录,安全,登录.永远不会到达任何地方,因为它无法通过 WebForms身份验证.世界一切都很好.

And that's how every crawl from GoogleBot, msnbot, and BingBot works. Robots, login, secure, login. Never getting anywhere, because it cannot get past WebForms Authentication. And all is well with the world.

直到一天,GoogleBot都会显示,并带有一个 Session 会话Cookie!

Until one day, GoogleBot shows up, with a Session cookie in hand!

Time      Uri                        Port  User Name            Status
========  =========================  ====  ===================  ======
1:49:21   GET /                      443   jatwood@example.com  200    ;they showed up logged in!
1:57:35   GET /ControlPanel.aspx     443   jatwood@example.com  200    ;now they're crawling that user's stuff!
1:57:35   GET /Defautl.aspx          443   jatwood@example.com  200    ;back to the homepage
2:07:21   GET /ViewTransaction.aspx  443   jatwood@example.com  200    ;and here comes the private information

用户jatwood@example.com已有超过一天未登录. (我希望IIS为两个同时访问者提供相同的会话标识符,并由应用程序回收隔开).并且我们的站点(web.config)未配置为启用无会话cookie.并且服务器(machine.config)未配置为启用无会话cookie.

The user, jatwood@example.com had not been logged in for over a day. (I was hoping that IIS had giving the same session identifier to two simultaneous visitors, separated by an application recycle). And our site (web.config) is not configured to enable session-less cookies. And the server (machine.config) is not configured to enable session-less cookies.

所以:

  • Google如何获得无会话cookie?
  • Google如何获得有效无会话cookie?
  • Google如何获得属于另一个用户的有效无会话cookie?
  • how did Google get ahold of a sessionless cookie?
  • how did Google get ahold of a valid sessionless cookie?
  • how did Google get ahold of a valid sessionless cookie that belonged to another user?

直到10月1日(4天前),GoogleBot仍在显示 ,手持cookie,以该用户身份登录,抓取,缓存和发布他们的一些私人详细信息

As recently as October 1 (4 days ago), the GoogleBot was still showing up, cookie in hand, logging in as this user, crawling, caching, and publishing, some of their private details.

Google 如何成为绕过 WebForms 身份验证的非恶意网络爬虫?

How is Google a non-malicious web-crawler bypassing WebForms authentication?

IIS7,Windows Server 2008 R2,单服务器.

IIS7, Windows Server 2008 R2, single server.

未将服务器配置为发出无cookie会话.但是忽略这一事实,Google如何绕过身份验证?

The server is not configured to give out cookieless sessions. But ignoring that fact, how can Google bypass authentication?

  • GoogleBot正在访问该网站,并尝试随机输入用户名和密码(不太可能,日志显示没有尝试登录)
  • GoogleBot决定在URL字符串中插入一个随机的无cookie会话,它恰好与现有用户(不太可能)
  • 的会话匹配
  • 用户设法弄清楚如何使IIS网站返回无cookie的URL (不太可能),然后将该URL粘贴到另一个网站(不太可能),Google在其中找到了无cookie的网址并对其进行了爬网
  • 用户正在通过移动代理(不是)运行..代理服务器不支持cookie,因此IIS创建了一个无cookie的会话.那个(例如 Opera Mobile )缓存服务器被破坏了(不太可能),并且所有缓存的链接都发布在了黑客论坛上. GoogleBot抓取了黑客论坛,并开始关注所有链接;包括我们的jatwood@example.com无cookie会话网址.
  • 用户感染了一种病毒,该病毒设法诱使任何IIS Web服务器返回无cookie的URL.然后,该病毒会报告给总部.这些网址会发布到GoogleBot抓取的可公开访问的资源上.然后,GoogleBot将使用无cookie的网址显示在我们的服务器上.
  • GoogleBot is visting the web-site, and attempting random usernames and passwords (not likely, the logs show no attempts to login)
  • GoogleBot decided to insert a random cookieless session into the url string, and it happened to match the session of an existing user (not likely)
  • The user managed to figure out how to make an IIS web-site return a cookieless url (not likely), then pasted that url onto another web-site (not likely), where Google found the cookieless url and crawled it
  • The user is running through mobile proxy (which they're not). The proxy server doesn't support cookies, so IIS creates a cookieless session. That (e.g. Opera Mobile) caching server was breached (not likely) and all cached links posted on a hacker forum. GoogleBot crawled the hacker forum, and started following all links; including our jatwood@example.com cookieless session url.
  • The user has a virus, which manages to cajole any IIS web-servers into handing back a cookieless url. That virus then reports back to headquarters. The urls are posted onto a publicly accessible resource, that GoogleBot crawl. GoogleBot then shows up at our server with the cookieless url.

这些都不是真的合理.

None of these are really plausable.

非恶意网络爬虫如何 Google 绕过WebForms身份验证,并劫持用户的现有会话?

How can Google a non-malicous web-crawler bypass WebForms authentication, and hijack a user's existing session?

我什至不知道操作方式一个没有配置为发出无cookie会话的ASP.net网站,可能会发出无cookie会话.是否可以将基于 cookie的会话ID 反向转换为基于无cookie的会话ID ?我可以引用web.configmachine.config的相关<sessionState>部分,并显示不存在

I don't even know how an ASP.net web-site, that is not configured to give out cookieless-sessions, could give out cookieless session. Is it possible to back-convert a cookie-based session id into a cookieless-based session id? I could quote the relevant <sessionState> section of web.config and machine.config, and show there is no presence of

<sessionState cookieless="true">

网络服务器如何确定浏览器不支持cookie?我曾尝试在Chrome中阻止Cookie,但从未获得无cookie会话标识符.我可以模拟不支持cookie的浏览器,以验证我的服务器没有发出无cookie的会话吗?

How does the web-server decide that the browser doesn't support cookies? I tried blocking cookies in Chrome, and I was never given a cookie-less session identifier. Can I simulate a browser that doesnt' support cookies, in order to verify that my server is not giving out cookieless sessions?

服务器是否通过 User-Agent 字符串确定无cookie会话?如果是这样,我可以将Internet Explorer设置为具有欺骗性的UA.

Does the server decide cookieless sessions by User-Agent string? If so, I could set Internet Explorer with a spoofed UA.

ASP.net中的会话身份是否仅取决于cookie?任何人都可以使用Cookie网址从任何IP地址访问该会话吗?默认情况下,ASP.net也不会考虑在内吗?

Does session identity in ASP.net depend solely on the cookie? Can anyone, from any IP, with the cookie-url, access that session? Does ASP.net not, by default, also take into account?

如果ASP.net 将IP地址与会话相关联,这是否意味着该会话不能源自其家庭计算机上的员工?因为那样的话,当GoogleBot搜寻器尝试通过Google IP使用它时,会失败吗?

If ASP.net does tie IP address with the session, wouldn't that mean that the session couldn't have originated from the employee at their home computer? Because then when the GoogleBot crawler tried to use it from a Google IP, it would have failed?

在没有配置ASP.net的情况下,ASP.net的任何地方(除我链接的实例之外)都存在无cookie会话的实例吗?是否存在Microsoft Connect问题?

Has there been any instances anywhere (besides the one I linked) of ASP.net giving out cookieless sessions when it's not configured to? Is there a Microsoft Connect issue on this?

是否已知Web表单身份验证存在问题,并且不应用于安全性?

Is Web-Forms authentication known to have issues, and should not be used to security?

编辑:删除了 Google 绕过特权的漫游器的名称,因为人们的脑袋是弱智;使 Google 混淆了该搜寻器的名称.我使用 Google 爬网程序的名称来提醒它是一个非恶意的Web爬网程序,它设法将其爬网到另一个用户的WebForm会话中.与此形成鲜明对比的是,该恶意爬虫试图闯入另一个用户的会话.没有什么比助学士更能激化病情的了.

Edit: Removed name of Google the bot that bypassed privilege, as people are pants on head retarded; confusing Google the name of the crawler for something else. I use Google the name of the crawler as a reminder that it was a non-malicious web-crawler that managed to crawl it's way into another user's WebForm's session. This is to contrast it with a malicious crawler, that was trying to break into another user's session. Nothing like a pedant to bring out the aggravation.

推荐答案

尽管该问题主要涉及会话标识符,但标识符的长度让我感到异常.

Though the question mainly references session identifiers, the length of the identifier struck me as unusual.

至少有两种类型的cookie/无cookie操作可以修改查询字符串以包含ID.

There are at least two types of cookie/cookieless operations that can modify the query string to include an ID.

  • 无Cookie会话
  • 无Cookie表单身份验证令牌

它们是彼此完全独立的(据我所知).

They are completely independent of each other (as far as I can tell).

无cookie会话允许服务器基于URL中的唯一ID与cookie中的唯一ID来访问会话状态数据.尽管ASP.Net重用了会话ID,这使它更易于进行会话固定尝试(尽管是单独的主题,但值得了解),但这通常被认为是一种好的做法.

A cookieless session allows the server to access session state data based on a unique ID in the URL versus a unique ID in a cookie. This is usually considered a fine practice, though ASP.Net reuses session IDs which makes it more prone to session fixation attempts (separate topic but worth knowing about).

ASP.net中的会话身份是否仅取决于cookie?能 任何人,从任何IP,使用cookie-url,都可以访问该会话?做 默认情况下,ASP.net也不会考虑在内吗?

Does session identity in ASP.net depend solely on the cookie? Can anyone, from any IP, with the cookie-url, access that session? Does ASP.net not, by default, also take into account?

仅需会话ID.

常规会话安全性阅读

基于示例数据的长度,我猜您的URL实际上包含表单身份验证值,而不是会话ID.源代码表明,无饼干模式不是必须明确启用的.

Based on the length of the example data, I'm guessing your URL actually contains a forms authentication value, not a session ID. The source code suggests that cookieless mode is not something you must explicitly enable.

/// <summary>ASP.NET determines whether to use cookies based on
/// <see cref="T:System.Web.HttpBrowserCapabilities" /> setting. 
/// If the setting indicates that the browser or device supports cookies, 
/// cookies are used; otherwise, an identifier is used in the query string.</summary>
UseDeviceProfile

确定方法如下:

// System.Web.Security.CookielessHelperClass
internal static bool UseCookieless( HttpContext context, bool doRedirect, HttpCookieMode cookieMode )
{
    switch( cookieMode )
    {
        case HttpCookieMode.UseUri:
            return true;
        case HttpCookieMode.UseCookies:
            return false;
        case HttpCookieMode.AutoDetect:
            {
                // omitted for length
                return false;
            }
        case HttpCookieMode.UseDeviceProfile:
            if( context == null )
            {
                context = HttpContext.Current;
            }
            return context != null && ( !context.Request.Browser.Cookies || !context.Request.Browser.SupportsRedirectWithCookie );
        default:
            return false;
    }
}

猜猜默认是什么? HttpCookieMode.UseDeviceProfile. ASP.Net维护设备和功能的列表.此列表通常是一件非常糟糕的事情;对于示例,IE11给出的误判是作为下层浏览器与Netscape 4相当.

Guess what the default is? HttpCookieMode.UseDeviceProfile. ASP.Net maintains a list of devices and capabilities. This list is generally a very bad thing; for example, IE11 gives a false positive for being a downlevel browser on par with Netscape 4.

我认为吉恩的解释很有可能; Google从某些用户操作中找到了该URL,并对其进行了爬网.

I think Gene's explanation is very likely; Google found the URL from some user action and crawled it.

完全可以想象Google机器人不支持cookie.但这并不能解释URL的来源,即是什么用户操作导致Google看到已经有ID的URL?一个简单的解释可能是用户使用的浏览器被认为不支持cookie.视浏览器而定,其他所有内容对用户来说似乎都很好.

It's completely conceivable that the Google bot is deemed to not support cookies. But this doesn't explain the origin of the URL, i.e. what user action resulted in Google seeing a URL with an ID already in it? A simple explanation could be a user with a browser that was deemed to not support cookies. Depending on the browser, everything else could look fine to the user.

时间(即有效期)似乎很长,尽管我对身份验证凭单有效的时间以及在什么情况下可以续签并不熟悉.与持续活跃的用户一样,ASP.Net完全有可能继续重新发行/续订票证.

The timing, i.e. the duration of validity seems long, though I'm not that familiar with how long the authentication tickets are valid and under what circumstances they could be renewed. It's entirely possible ASP.Net continued to reissue/renew tickets as it would do for a continually active user.

我在这里做了很多假设,但是如果我没错:

I'm making a lot of assumptions here, but If I'm correct:

  • 首先,重现您环境中的行为.
  • 使用HttpCookieMode.UseCookies明确禁用无cookie的行为.

  • First, reproduce the behavior in your environment.
  • Explicitly disable cookieless behavior by using HttpCookieMode.UseCookies.

web.config :

 <authentication mode="Forms">
    <forms loginUrl="~/Account/Login.aspx" name=".ASPXFORMSAUTH" timeout="26297438"
           cookieless="UseCookies" />
 </authentication>

这可以解决问题,但您可以研究扩展表单身份验证HTTP模块并添加其他验证(或至少进行日志记录/诊断).

While this should resolve the behavior, you might investigate extending the forms authentication HTTP module and adding additional validation (or at least logging/diagnostics).

这篇关于无害的搜寻器如何绕过WebForms身份验证并劫持用户的会话?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆