使用Python机械化登录到使用NTLM身份验证页面 [英] Use python mechanize to log into pages with NTLM authentication
问题描述
我想用机械化登录页面和检索一些信息。不过,无论我尝试验证它只是失败,错误code HTTP 401 上,你可以看到如下:
I want to use mechanize to log into a page and retrieve some information. But however I try to authenticate It just fails with Error code HTTP 401, as you can see below:
r = br.open('http://intra')
File "bui...e\_mechanize.py", line 203, in open
File "bui...g\mechanize\_mechanize.py", line 255,
in _mech_openmechanize._response.httperror_seek_wrapper: HTTP Error 401: Unauthorized
这是我的code迄今:
This is my code so far:
import mechanize
import cookielib
# Browser
br = mechanize.Browser()
# Cookie Jar
cj = cookielib.LWPCookieJar()
br.set_cookiejar(cj)
# Browser options
br.set_handle_equiv(True)
# br.set_handle_gzip(True)
br.set_handle_redirect(True)
br.set_handle_referer(True)
br.set_handle_robots(False)
# Follows refresh 0 but not hangs on refresh > 0
br.set_handle_refresh(mechanize._http.HTTPRefreshProcessor(), max_time=1)
# If the protected site didn't receive the authentication data you would
# end up with a 410 error in your face
br.add_password('http://intra', 'myusername', 'mypassword')
# User-Agent (this is cheating, ok?)
br.addheaders = [('User-agent', 'Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.0.1) Gecko/2008071615 Fedora/3.0.1-1.fc9 Firefox/3.0.1')]
# Open some site, let's pick a random one, the first that pops in mind:
# r = br.open('http://google.com')
r = br.open('http://intra')
html = r.read()
# Show the source
print html
我是什么做错了吗?参观的http://内
(内部页)与如铬,它会弹出一个开放的窗口和用户名/密码,问一次,然后一切都很好。
What am I doing wrong? visiting http://intra
(internal page) with e.g. chrome, it pops open a windows and asks for username/password once and then all is good.
这弹开看起来像这样的对话:
The dialogue which pops open looks like this:
推荐答案
在吨的安全研究我设法找出背后的原因。
After tons of reaserch I managed to find out the reason behind this.
所有的网站查找使用所谓 NTLM身份验证,这是不通过机械化支持。
这可以帮助找出一个网站的认证机制:
Find of all the site uses a so called NTLM authentication, which is not supported by mechanize. This can help to find out the authentication mechanism of a site:
wget -O /dev/null -S http://www.the-site.com/
所以,code修改一点点:
So the code was modified a little bit:
import sys
import urllib2
import mechanize
from ntlm import HTTPNtlmAuthHandler
print("LOGIN...")
user = sys.argv[1]
password = sys.argv[2]
url = sys.argv[3]
passman = urllib2.HTTPPasswordMgrWithDefaultRealm()
passman.add_password(None, url, user, password)
# create the NTLM authentication handler
auth_NTLM = HTTPNtlmAuthHandler.HTTPNtlmAuthHandler(passman)
browser = mechanize.Browser()
handlersToKeep = []
for handler in browser.handlers:
if not isinstance(handler,
(mechanize._http.HTTPRobotRulesProcessor)):
handlersToKeep.append(handler)
browser.handlers = handlersToKeep
browser.add_handler(auth_NTLM)
response = browser.open(url)
response = browser.open("http://www.the-site.com")
print(response.read())
和最后机械化需要修补,如前所述<一个href=\"http://stackoverflow.com/questions/13649964/python-mechanize-with-ntlm-getting-attributeerror-htt$p$psponse-instance-has-no\">here:
and finally mechanize needs to be patched, as mentioned here:
--- _response.py.old 2013-02-06 11:14:33.208385467 +0100
+++ _response.py 2013-02-06 11:21:41.884081708 +0100
@@ -350,8 +350,13 @@
self.fileno = self.fp.fileno
else:
self.fileno = lambda: None
- self.__iter__ = self.fp.__iter__
- self.next = self.fp.next
+
+ if hasattr(self.fp, "__iter__"):
+ self.__iter__ = self.fp.__iter__
+ self.next = self.fp.next
+ else:
+ self.__iter__ = lambda self: self
+ self.next = lambda self: self.fp.readline()
def __repr__(self):
return '<%s at %s whose fp = %r>' % (
这篇关于使用Python机械化登录到使用NTLM身份验证页面的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!