如何在 Python 中使用两级代理设置? [英] how to use two level proxy setting in Python?
问题描述
我正在研究网络爬虫 [使用 python].
I am working on web-crawler [using python].
情况是,例如,我在 server-1 后面,我使用代理设置连接到外部世界.所以在 Python 中,使用代理处理程序我可以获取 url.现在的情况是,我正在构建一个爬虫,所以我不能只使用一个 IP [否则我会被阻止].为了解决这个问题,我有一堆代理,我想洗牌.
Situation is, for example, I am behind server-1 and I use proxy setting to connect to the Outside world. So in Python, using proxy-handler I can fetch the urls. Now thing is, I am building a crawler so I cannot use only one IP [otherwise I will be blocked]. To solve this, I have bunch of Proxies, I want to shuffle through.
我的问题是:这是二级代理,一级连接到主服务器-1,我使用代理,然后通过代理混洗后,我想使用代理.我怎样才能做到这一点?
My question is: This is two level proxy, one to connect to main server-1, I use proxy and then after to shuffle through proxies, I want to use proxy. How can I achieve this?
推荐答案
Update 听起来您希望连接到代理 A,然后通过代理 B、C、D 启动 HTTP 连接,在 A 之外.您可能会查看 proxychains 项目,它说它可以通过用户隧道传输任何协议-定义的 TOR、SOCKS 4/5 和 HTTP 代理链".
Update Sounds like you're looking to connect to proxy A and from there initiate HTTP connections via proxies B, C, D which are outside of A. You might look into the proxychains project which says it can "tunnel any protocol via a user-defined chain of TOR, SOCKS 4/5, and HTTP proxies".
版本 3.1 在 Ubuntu Lucid 中作为一个包提供.如果它不能直接为您工作,proxychains 源代码 可能会提供有关如何为您的应用实施此功能的一些见解.
Version 3.1 is available as a package in Ubuntu Lucid. If it doesn't work directly for you, the proxychains source code may provide some insight into how this capability could be implemented for your app.
原始答案:查看 urllib2.ProxyHandler.以下是如何使用多种不同代理打开网址的示例:
Orig answer: Check out the urllib2.ProxyHandler. Here is an example of how you can use several different proxies to open urls:
import random
import urllib2
# put the urls for all of your proxies in a list
proxies = ['http://localhost:8080/']
# construct your list of url openers which each use a different proxy
openers = []
for proxy in proxies:
opener = urllib2.build_opener(urllib2.ProxyHandler({'http': proxy}))
openers.append(opener)
# select a url opener randomly, round-robin, or with some other scheme
opener = random.choice(openers)
req = urllib2.Request(url)
res = opener.open(req)
这篇关于如何在 Python 中使用两级代理设置?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!