如何使用python从Intranet站点抓取URL数据? [英] How to scrape URL data from intranet site using python?
问题描述
我需要一个 Python Warrior 来帮助我(我是菜鸟)!我正在尝试使用模块 urllib 从内部网站抓取某些数据.但是,由于我公司的网站仅供员工查看而不向公众开放,因此我想这就是我获得此代码的原因:
I need a Python Warrior to help me (I'm a noob)! I'm trying to scrape certain data from an intra-net site using Module urllib. However, since it is my company website that is only available to employees to view and not to the public, I think this is why I get this code:
IOError: ('http 错误', 401, '未授权', )
IOError: ('http error', 401, 'Unauthorized', )
我是怎么解决这个问题的?它甚至不会使用 htmlfile.read()
How do I come about this? It won't even read the site using htmlfile.read()
获取公共站点的示例代码:
Sample code to get public site:
import urllib
import re
htmlfile = urllib.urlopen("http://finance.yahoo.com/q?s=AAPL")
htmltext = htmlfile.read()
regex = '<span id="yfs_l84_aapl">(.+?)</span>'
pattern = re.compile(regex)
price = re.findall(pattern,htmltext)
print price
推荐答案
尝试 requests 使用 requests_ntlm:
import requests
from requests_ntlm import HttpNtlmAuth
r = requests.get("http://ntlm_protected_site.com",auth=HttpNtlmAuth('domain\\username','password'))
print r.text
如果您需要有关此库的任何细节的帮助并且在文档中找不到它,请发表评论.
If you need help with any specifics of this library and can't find it in the docs, leave a comment.
这篇关于如何使用python从Intranet站点抓取URL数据?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!