如何使用Python来检索需要http登录的xml页面? [英] How to use Python to retrieve xml page that requires http login?

查看:220
本文介绍了如何使用Python来检索需要http登录的xml页面?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

当我访问IIS服务器上的页面以检索xml时,通过浏览器使用查询参数(使用下面示例中的http),我得到一个用户名和密码的弹出登录对话框(似乎是一个系统标准对话框/表格)。一旦提交,数据就会到来。作为xml页面。

When I access a page on an IIS server to retrieve xml, using a query parameter through the browser (using the http in the below example) I get a pop-up login dialog for username and password (appears to be a system standard dialog/form). and once submitted the data arrives. as an xml page.

如何使用urllib处理此问题?当我执行以下操作时,我从未得到提示输入uid / psw ..我只是得到一个回溯,表明服务器(正确)id我是未授权的。在Ipython笔记本中使用python 2.7

How do I handle this with urllib? when I do the following, I never get prompted for a uid/psw.. I just get a traceback indicating the server (correctly ) id's me as not authorized. Using python 2.7 in Ipython notebook

f = urllib.urlopen("http://www.nalmls.com/SERetsHuntsville/Search.aspx?SearchType=Property&Class=RES&StandardNames=0&Format=COMPACT&Query=(DATE_MODIFIED=2012-09-28T00:00:00%2B)&Limit=10")
s = f.read()
f.close()

指向doc的指针也表示赞赏!没有找到这个确切的用例。

Pointers to doc also appreciated! did not find this exact use case.

如果有所不同,我计划将xml解析为csv。

I plan to parse the xml to csv if that makes a difference.

推荐答案

您正在处理 http身份验证。我总是发现使用urllib库快速工作很棘手。 请求 python包使其变得非常简单。

You are dealing with http authentication. I've always found it tricky to get working quickly with the urllib library. The requests python package makes it super simple.

url = "http://www.nalmls.com/SERetsHuntsville/Search.aspx?SearchType=Property&Class=RES&StandardNames=0&Format=COMPACT&Query=(DATE_MODIFIED=2012-09-28T00:00:00%2B)&Limit=10"
r = requests.get(url, auth=('user', 'pass'))
page = r.text

如果你看看那个网址的标题,你可以看到它正在使用摘要认证:

If you look at the headers for that url you can see that it is using digest authentication:


{'content-length':'1893' ,'x-powered-by':'ASP.NET',
'x-aspnet-version':'4.0.30319','server':'Microsoft-IIS / 7.5',
' cache-control':'private','date':'Fri,05 Oct 2012 18:20:54 GMT',
'content-type':'text / html; charset = utf-8','www-authenticate':
'Digest realm = Solid Earth,nonce =MTAvNS8yMDEyIDE6MjE6MjUgUE0,
opaque =0000000000000000,陈旧= false,algorithm = MD5,qop =auth'}

{'content-length': '1893', 'x-powered-by': 'ASP.NET', 'x-aspnet-version': '4.0.30319', 'server': 'Microsoft-IIS/7.5', 'cache-control': 'private', 'date': 'Fri, 05 Oct 2012 18:20:54 GMT', 'content-type': 'text/html; charset=utf-8', 'www-authenticate': 'Digest realm="Solid Earth", nonce="MTAvNS8yMDEyIDE6MjE6MjUgUE0", opaque="0000000000000000", stale=false, algorithm=MD5, qop="auth"'}

所以你需要:

from requests.auth import HTTPDigestAuth
r = requests.get(url, auth=HTTPDigestAuth('user', 'pass'))

这篇关于如何使用Python来检索需要http登录的xml页面?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆