如何在 Python 中使用两级代理设置? [英] how to use two level proxy setting in Python?

查看:49
本文介绍了如何在 Python 中使用两级代理设置?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在研究网络爬虫 [使用 python].

I am working on web-crawler [using python].

情况是,例如,我在 server-1 后面,我使用代理设置连接到外部世界.所以在 Python 中,使用代理处理程序我可以获取 url.现在的情况是,我正在构建一个爬虫,所以我不能只使用一个 IP [否则我会被阻止].为了解决这个问题,我有一堆代理,我想洗牌.

Situation is, for example, I am behind server-1 and I use proxy setting to connect to the Outside world. So in Python, using proxy-handler I can fetch the urls. Now thing is, I am building a crawler so I cannot use only one IP [otherwise I will be blocked]. To solve this, I have bunch of Proxies, I want to shuffle through.

我的问题是:这是二级代理,一级连接到主服务器-1,我使用代理,然后通过代理混洗后,我想使用代理.我怎样才能做到这一点?

My question is: This is two level proxy, one to connect to main server-1, I use proxy and then after to shuffle through proxies, I want to use proxy. How can I achieve this?

推荐答案

Update 听起来您希望连接到代理 A,然后通过代理 B、C、D 启动 HTTP 连接,在 A 之外.您可能会查看 proxychains 项目,它说它可以通过用户隧道传输任何协议-定义的 TOR、SOCKS 4/5 和 HTTP 代理链".

Update Sounds like you're looking to connect to proxy A and from there initiate HTTP connections via proxies B, C, D which are outside of A. You might look into the proxychains project which says it can "tunnel any protocol via a user-defined chain of TOR, SOCKS 4/5, and HTTP proxies".

版本 3.1 在 Ubuntu Lucid 中作为一个包提供.如果它不能直接为您工作,proxychains 源代码 可能会提供有关如何为您的应用实施此功能的一些见解.

Version 3.1 is available as a package in Ubuntu Lucid. If it doesn't work directly for you, the proxychains source code may provide some insight into how this capability could be implemented for your app.

原始答案:查看 urllib2.ProxyHandler.以下是如何使用多种不同代理打开网址的示例:

Orig answer: Check out the urllib2.ProxyHandler. Here is an example of how you can use several different proxies to open urls:

import random
import urllib2

# put the urls for all of your proxies in a list
proxies = ['http://localhost:8080/']

# construct your list of url openers which each use a different proxy
openers = []
for proxy in proxies:
    opener = urllib2.build_opener(urllib2.ProxyHandler({'http': proxy}))
    openers.append(opener)

# select a url opener randomly, round-robin, or with some other scheme
opener = random.choice(openers)
req = urllib2.Request(url)
res = opener.open(req)

这篇关于如何在 Python 中使用两级代理设置?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆