python,无法识别身份验证-urllib2,请求,asp.net [英] python, authentication not recognised - urllib2, requests, asp.net

查看:99
本文介绍了python,无法识别身份验证-urllib2,请求,asp.net的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

尽管我在这方面并不特别先进,但是在使用urrlib2,请求和抓狂方面我已经取得了一些成功,但这使我感到很困惑.因此,在经过大量搜索并将我的头撞在键盘上之后,我将继续询问.

我想获取网站的html源代码,但是在使用了用户名和密码后,我不断返回一个页面,提示我的用户名和密码错误.它们可以在浏览器中正常工作,并且一旦登录就可以使用源代码(通过浏览器).但是我似乎无法通过python/terminal获得相同的结果.我将在下面包括一些尝试(从这些有用的页面中闪烁):

使用urllib2:

req = Request(website, headers={ 'User-Agent': 'Mozilla/5.0' })
base64string = base64.encodestring('%s:%s' % (username, password)).replace('\n', '')
req.add_header("Authorization", "Basic %s" % base64string)
readweb = urlopen(req).read()

另一个版本:

passman = urllib2.HTTPPasswordMgrWithDefaultRealm()
passman.add_password(None, theurl, username, password)

authhandler = urllib2.HTTPBasicAuthHandler(passman)
opener = urllib2.build_opener(authhandler)

pagehandle = opener.open(theurl)
return pagehandle.read()

并尝试使用请求:

r = requests.session()
try:
    r.post(theurl, data={'username' : 'username', 'password' : 'password', 'remember':'1'})
except:
    print('Sorry, Unable to...')
result = r.get(theurl)
return result.text

我也尝试过使用scrapy,但是无论使用哪个库,它都会随页面的html一起返回,该页面显示我的密码/详细信息是错误的.我猜想这与我发送的标头/授权(?)有关,但我不太确定.非常感谢您提供的任何帮助,请让我知道我可以更新的其他详细信息(我已经花了半夜时间,所以如果这篇帖子没有意义,请原谅!)

以下是对Prashant的回答的回溯响应(减去密码等):

Traceback (most recent call last):

文件"/Users/Hatsaw/newpy/pras.py",第3行,在 r = request.get(URL,auth =('用户名','密码')) 在获取的文件"/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/requests-2.9.0-py2.7.egg/requests/api.py"中,第67行 返回请求('get',url,params = params,** kwargs) 请求中的文件"/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/requests-2.9.0-py2.7.egg/requests/api.py",第53行 return session.request(method = method,url = url,** kwargs) 请求中的文件"/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/requests-2.9.0-py2.7.egg/requests/sessions.py",行468 resp = self.send(准备,** send_kwargs) 发送中的文件"/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/requests-2.9.0-py2.7.egg/requests/sessions.py",第576行 r = adapter.send(request,** kwargs) 发送中的文件"/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/requests-2.9.0-py2.7.egg/requests/adapters.py",行437 引发ConnectionError(e,request = request) requests.exceptions.ConnectionError:HTTPConnectionPool(host ='website',port = 80):URL超过最大重试次数:/dashboard/(由NewConnectionError(':导致:无法建立新连接:[Errno 8] nodename或servname被提供) ,或未知",))

好吧,我现在正在使用机械化(下面推荐),这就是我要返回的内容(不确定这是我的根本问题的另一例还是我无法机械化!):

Traceback (most recent call last):

文件"/Users/Hatsaw/newpy/pras2.py",第13行,在 browser.form ['email'] ='电子邮件地址' 在 setitem 中的文件"build/bdist.macosx-10.6-intel/egg/mechanize/_form.py",行2780 在find_control中的文件"build/bdist.macosx-10.6-intel/egg/mechanize/_form.py",行3101 _find_control中的文件"build/bdist.macosx-10.6-intel/egg/mechanize/_form.py",行3185 mechanize._form.ControlNotFoundError:没有与名称"email"匹配的控件

仍在为此苦苦挣扎,因此这是该项目时间用完之前的最后一次尝试,我必须手动获取所有html!手指交叉.

好吧,以Barny的建议为例,我回到了使用请求的角度,并且尝试为该帖子提供从成功的浏览器登录中闪过的cookie信息.我不确定我是否正确执行了此操作,但是我正在使用:

cookies = {'PHPSESSID':'5udcifi6p43ma3h1fnpfqghiu0'}
result = sess.get(the_url, cookies=cookies)

现在,我现在收到内部服务器错误响应.经过一番研究,aspnet的形式似乎是个问题:

我只想先检查一下我的请求是否有问题,然后再按上面的SO链接中Martijn Pieters的建议探索BeautifulSoup/robobrowser.

这是html的表单部分要求的内容:

<form name="aspnetForm" method="post" action="" id="aspnetForm">
<div>
<input type="hidden" name="__EVENTTARGET" id="__EVENTTARGET" value="" />
<input type="hidden" name="__EVENTARGUMENT" id="__EVENTARGUMENT" value="" />
<input type="hidden" name="__LASTFOCUS" id="__LASTFOCUS" value="" />
<input type="hidden" name="__VIEWSTATEFIELDCOUNT" id="__VIEWSTATEFIELDCOUNT" value="2" />
<input type="hidden" name="__VIEWSTATE" id="__VIEWSTATE" value="/wEPDwUKLTkwNzg1NTQ3OA9kFgJmD2QWAmYPZBYGAgetc." />
<input type="hidden" name="__VIEWSTATE1" id="__VIEWSTATE1"     value="ZyBBIEhvbWUVIE5lZ290aWF0ZSBBZ3JlZW1lbnRzEiBSZetc." />
</div>

<script type="text/javascript">
//<![CDATA[
var theForm = document.forms['aspnetForm'];
if (!theForm) {
theForm = document.aspnetForm;
}
function __doPostBack(eventTarget, eventArgument) {
if (!theForm.onsubmit || (theForm.onsubmit() != false)) {
    theForm.__EVENTTARGET.value = eventTarget;
    theForm.__EVENTARGUMENT.value = eventArgument;
    theForm.submit();
}
}
//]]>
</script>


<script src="/WebResource.axd?d=t2SAOwDGkbrEfkmUaMOR9sPLXqgxfeenNayRja3DNK2R8JEcH-StTTuiaqXpzp--PAISn3vzVbWQ7biREwPkibCmbAE1&amp;t=635586505120000000" type="text/javascript"></script>


<script src="/ScriptResource.axd?d=EL6tXtJfNfGSoQwhYtVnYEqw4oKvuwBBI4etc."     type="text/javascript"></script>
<script type="text/javascript">
//<![CDATA[
if (typeof(Sys) === 'undefined') throw new Error('ASP.NET Ajax client-side framework failed to load.');
//]]>
</script>

<script src="/ScriptResource.axd?d=qCmNMcECQa0tfmMcZdwJeeOdcyetc." type="text/javascript"></script>
<div>

<input type="hidden" name="__VIEWSTATEGENERATOR" id="__VIEWSTATEGENERATOR" value="FC5C7135" />
<input type="hidden" name="__EVENTVALIDATION" id="__EVENTVALIDATION" value="/wEdABB2xJRvPLCcg6GsBqRFCtw6Xg91QEu10etc." />
</div>

所以.一些小问题.

  • 我的用户/通过"术语是否必须与源代码匹配,即用户名=用户名或用户?: 我现在在html的位置找不到了,但是找到了'ctl00 $ cphMain $ tbUsername'和'ctl00 $ cphMain $ tbPassword'...

  • 我是否需要将密码和/或用户名作为base64.encodestring发送? (我不知道这是否有问题,但是密码包含字符,例如!@ $等.)

  • 我是否需要添加从浏览器中找到的所有cookie字段,或者仅添加PHPSESSID?这是我在Cookie中获得的字段:

ASP.NET_SessionId,CFID,CFTOKEN,__ atuvc,__ utma,__ utmb,__ utmc,__ utmt,__ utmz,BRO_CALLME,BRO_ID,BRO_LOGIN,BRO_MEMBER,BROAUTH,ISFULLMEMBER,phpMBLink,__ CT_Data,WRUID

  • 有网站(www.website.com),登录页面(www.website.com/login)和内容(www.website.com/content).我以为我使用了(成功登录的)登录页面中的cookie并将其发送"到内容页面中是否正确?我应该手动执行此操作(从浏览器Cookie信息中输入字段详细信息)还是在代码中执行此操作(因此,在下面的代码中,我将使用:cookies = r_login.cookies)?

最后,这是我当前正在使用的返回内部服务器错误的代码.

import requests

the_url = 'the_url'
login = the_url + '/login'
content = the_url + '/content'
username = 'username'
password = 'password'

sess = requests.Session()
sess.auth = ('username', 'password')
sess.get(the_url)

payload = {'ctl00$cphMain$tbUsername': username, 'ctl00$cphMain$tbPassword': password}
r_login = sess.post(login, data=payload)

cookies = {'PHPSESSID':'5udcifi6p43ma3h1fnpfqghiu0', 'ASP.NET_SessionId':'aspnet', 'BRO_LOGIN':'bro_login'}
r_data = s.get(content, cookies=cookies, data=payload)

print r_data.text

抱歉,这已经很久了,如果我需要将其拆分成几篇文章,请告诉我-我以为一开始的一个简单问题已经转变为其他问题!

解决方案

胜利!

好,非常感谢Prashant和barny的回应,并非常感谢Martijn Pieters通过这篇文章: 使用Python的请求发送ASP.net POST

我发现我的救赎是 robot .

代码如下:

from robobrowser import RoboBrowser

the_url = 'the_url'
login = the_url + '/login'
content = the_url + '/content'
username = 'username'
password = 'password'

browser = RoboBrowser(parser='lxml')

browser.open(login)
form = browser.get_forms()  

# You can use '.get_form()' for a specific form but I'm finding it easier to 
# using '.get_forms()' to get all the forms and then I'm just interested 
# in the first one:

form = form[0]
print form     # this will give you the information you need to 
               # now enter your password details:   

form['the_user'].value = username
form['the_pass'].value = password

browser.submit_form(form)

# and then because I'm after the html of certain content pages:

browser.open(content)
source = str(browser.parsed)
return source

Though I'm not particularly advanced at any of this, I've had some past success in using urrlib2, requests and scrapy but this has me stumped. So after much searching and banging my head against the keyboard, I'll just go ahead and ask.

I'd like to get the html source code of a site but after using my username and password, I keep getting a page thrown back which says my username and password are wrong. They work fine in the browser, and once logged in the source code is readily available (via browser). But I can't seem to achieve the same result via python/terminal. I'll include some of my attempts (gleamed from the these helpful pages) below:

using urllib2:

req = Request(website, headers={ 'User-Agent': 'Mozilla/5.0' })
base64string = base64.encodestring('%s:%s' % (username, password)).replace('\n', '')
req.add_header("Authorization", "Basic %s" % base64string)
readweb = urlopen(req).read()

another version:

passman = urllib2.HTTPPasswordMgrWithDefaultRealm()
passman.add_password(None, theurl, username, password)

authhandler = urllib2.HTTPBasicAuthHandler(passman)
opener = urllib2.build_opener(authhandler)

pagehandle = opener.open(theurl)
return pagehandle.read()

and an attempt using requests:

r = requests.session()
try:
    r.post(theurl, data={'username' : 'username', 'password' : 'password', 'remember':'1'})
except:
    print('Sorry, Unable to...')
result = r.get(theurl)
return result.text

I've also tried to use scrapy, but regardless of which library I use it comes back with the html of a page which says my password/details are wrong. I'm guessing it's something to do with the headers/authorisation(?) I'm sending, but I'm not overly sure. Any help much appreciated, please let me know what other details I can update with (I've been up half the night with this, so if this post doesn't make sense please forgive me!)

EDIT:

Here's the traceback response to Prashant's answer below (minus the passwords etc.):

Traceback (most recent call last):

File "/Users/Hatsaw/newpy/pras.py", line 3, in r = requests.get(URL, auth=('username','password')) File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/requests-2.9.0-py2.7.egg/requests/api.py", line 67, in get return request('get', url, params=params, **kwargs) File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/requests-2.9.0-py2.7.egg/requests/api.py", line 53, in request return session.request(method=method, url=url, **kwargs) File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/requests-2.9.0-py2.7.egg/requests/sessions.py", line 468, in request resp = self.send(prep, **send_kwargs) File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/requests-2.9.0-py2.7.egg/requests/sessions.py", line 576, in send r = adapter.send(request, **kwargs) File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/requests-2.9.0-py2.7.egg/requests/adapters.py", line 437, in send raise ConnectionError(e, request=request) requests.exceptions.ConnectionError: HTTPConnectionPool(host='website', port=80): Max retries exceeded with url: /dashboard/ (Caused by NewConnectionError(': Failed to establish a new connection: [Errno 8] nodename nor servname provided, or not known',))

EDIT:

Ok, I'm now using mechanize (recommended below), and here's what I'm getting back (not sure if this is another instance of my root problem or my inability with mechanize!):

Traceback (most recent call last):

File "/Users/Hatsaw/newpy/pras2.py", line 13, in browser.form['email'] = 'email address' File "build/bdist.macosx-10.6-intel/egg/mechanize/_form.py", line 2780, in setitem File "build/bdist.macosx-10.6-intel/egg/mechanize/_form.py", line 3101, in find_control File "build/bdist.macosx-10.6-intel/egg/mechanize/_form.py", line 3185, in _find_control mechanize._form.ControlNotFoundError: no control matching name 'email'

EDIT:

Still struggling with this, so here's a last ditch effort before time runs out on this project and I have to go in and get all the html manually! Fingers crossed..

Ok, so on the advice of barny, I'm back to using requests, and I'm attempting to provide the post with cookie information that I've gleamed from a successful browser login. I'm not certain I'm doing this correctly, but I'm using:

cookies = {'PHPSESSID':'5udcifi6p43ma3h1fnpfqghiu0'}
result = sess.get(the_url, cookies=cookies)

Now, at the moment, I'm getting an Internal Server Error response. After some research, aspnet forms seems to be the problem:

I just want to check that I'm not doing something wrong with requests first, then perhaps I'll explore BeautifulSoup/robobrowser as recommended by Martijn Pieters in the SO link above.

Here's what the form section of the html is asking:

<form name="aspnetForm" method="post" action="" id="aspnetForm">
<div>
<input type="hidden" name="__EVENTTARGET" id="__EVENTTARGET" value="" />
<input type="hidden" name="__EVENTARGUMENT" id="__EVENTARGUMENT" value="" />
<input type="hidden" name="__LASTFOCUS" id="__LASTFOCUS" value="" />
<input type="hidden" name="__VIEWSTATEFIELDCOUNT" id="__VIEWSTATEFIELDCOUNT" value="2" />
<input type="hidden" name="__VIEWSTATE" id="__VIEWSTATE" value="/wEPDwUKLTkwNzg1NTQ3OA9kFgJmD2QWAmYPZBYGAgetc." />
<input type="hidden" name="__VIEWSTATE1" id="__VIEWSTATE1"     value="ZyBBIEhvbWUVIE5lZ290aWF0ZSBBZ3JlZW1lbnRzEiBSZetc." />
</div>

<script type="text/javascript">
//<![CDATA[
var theForm = document.forms['aspnetForm'];
if (!theForm) {
theForm = document.aspnetForm;
}
function __doPostBack(eventTarget, eventArgument) {
if (!theForm.onsubmit || (theForm.onsubmit() != false)) {
    theForm.__EVENTTARGET.value = eventTarget;
    theForm.__EVENTARGUMENT.value = eventArgument;
    theForm.submit();
}
}
//]]>
</script>


<script src="/WebResource.axd?d=t2SAOwDGkbrEfkmUaMOR9sPLXqgxfeenNayRja3DNK2R8JEcH-StTTuiaqXpzp--PAISn3vzVbWQ7biREwPkibCmbAE1&amp;t=635586505120000000" type="text/javascript"></script>


<script src="/ScriptResource.axd?d=EL6tXtJfNfGSoQwhYtVnYEqw4oKvuwBBI4etc."     type="text/javascript"></script>
<script type="text/javascript">
//<![CDATA[
if (typeof(Sys) === 'undefined') throw new Error('ASP.NET Ajax client-side framework failed to load.');
//]]>
</script>

<script src="/ScriptResource.axd?d=qCmNMcECQa0tfmMcZdwJeeOdcyetc." type="text/javascript"></script>
<div>

<input type="hidden" name="__VIEWSTATEGENERATOR" id="__VIEWSTATEGENERATOR" value="FC5C7135" />
<input type="hidden" name="__EVENTVALIDATION" id="__EVENTVALIDATION" value="/wEdABB2xJRvPLCcg6GsBqRFCtw6Xg91QEu10etc." />
</div>

So. Some small questions.

  • Does my 'user/pass' terminology have to match the source code i.e username = username or user?: I've lost where I found this in the html now, but I found 'ctl00$cphMain$tbUsername' and 'ctl00$cphMain$tbPassword'…

  • Do I need to send the password and/or username as a base64.encodestring? (I don't know if this is a problem, but the password contains chars such as !@$ etc.)

  • Do I need to add ALL of the cookie fields I've found from the browser or just the PHPSESSID? Here are the fields I've got in the cookies:

ASP.NET_SessionId, CFID, CFTOKEN, __atuvc, __utma, __utmb, __utmc, __utmt, __utmz, BRO_CALLME, BRO_ID, BRO_LOGIN, BRO_MEMBER, BROAUTH, ISFULLMEMBER, phpMBLink, __CT_Data, WRUID

  • There is the website (www.website.com), the login-page (www.website.com/login), and then the content (www.website.com/content). Am I correct in thinking I use the cookie from the (successfully logged in) login-page and 'send' it to the content page? Should I do this manually (enter field details from browser cookie information) or within the code (so, in code below I would use: cookies = r_login.cookies)?

Finally, here's the code I'm currently using that returns an Internal Server Error..:

import requests

the_url = 'the_url'
login = the_url + '/login'
content = the_url + '/content'
username = 'username'
password = 'password'

sess = requests.Session()
sess.auth = ('username', 'password')
sess.get(the_url)

payload = {'ctl00$cphMain$tbUsername': username, 'ctl00$cphMain$tbPassword': password}
r_login = sess.post(login, data=payload)

cookies = {'PHPSESSID':'5udcifi6p43ma3h1fnpfqghiu0', 'ASP.NET_SessionId':'aspnet', 'BRO_LOGIN':'bro_login'}
r_data = s.get(content, cookies=cookies, data=payload)

print r_data.text

Apologies, this has gotten rather long now, if I need to split it up over several posts please let me know - what I assumed was a simple question at the outset has mutated into something else!

解决方案

Victory!

Ok, with thanks to Prashant and barny for their responses, and a big thanks to Martijn Pieters via this post: Sending an ASP.net POST with Python's Requests

I've found my salvation to be robobot.

Here's the code:

from robobrowser import RoboBrowser

the_url = 'the_url'
login = the_url + '/login'
content = the_url + '/content'
username = 'username'
password = 'password'

browser = RoboBrowser(parser='lxml')

browser.open(login)
form = browser.get_forms()  

# You can use '.get_form()' for a specific form but I'm finding it easier to 
# using '.get_forms()' to get all the forms and then I'm just interested 
# in the first one:

form = form[0]
print form     # this will give you the information you need to 
               # now enter your password details:   

form['the_user'].value = username
form['the_pass'].value = password

browser.submit_form(form)

# and then because I'm after the html of certain content pages:

browser.open(content)
source = str(browser.parsed)
return source

这篇关于python,无法识别身份验证-urllib2,请求,asp.net的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆