Python:如何使用请求库通过几个不同的代理服务器访问URL? [英] Python: How to use requests library to access a url through several different proxy servers?
问题描述
正如标题中所述,我正在尝试依次通过几个不同的代理访问url(使用for循环).现在,这是我的代码:
import requests
import json
with open('proxies.txt') as proxies:
for line in proxies:
proxy=json.loads(line)
with open('urls.txt') as urls:
for line in urls:
url=line.rstrip()
data=requests.get(url, proxies={'http':line})
data1=data.text
print data1
和我的urls.txt文件:
http://api.exip.org/?call=ip
和我的proxies.txt文件:
{"https": "84.22.41.1:3128"}
{"http":"194.126.181.47:81"}
{"http":"218.108.170.170:82"}
我在[www.hidemyass.com] [1]上获得的
由于某种原因,输出为
68.6.34.253
68.6.34.253
68.6.34.253
好像它是通过我自己的路由器ip地址访问该网站一样.换句话说,它不是在尝试通过我提供的代理进行访问,它只是循环遍历并反复使用我自己的代理.我在做什么错了?
这里有两个明显的问题:
data=requests.get(url, proxies={'http':line})
首先,由于您在for line in proxies:
内有一个for line in urls:
,因此line
将成为此处的当前 URL ,而不是当前的代理.而且,即使您不重用line
,它也将是JSON字符串表示形式,而不是您从JSON解码的字典.
然后,如果您将其修复为使用proxy
,而不是{'https': '83.22.41.1:3128'}
之类的东西,那么您将传递{'http': {'https': '83.22.41.1:3128'}}
.显然这不是一个有效值.
要解决这两个问题,只需执行以下操作:
data=requests.get(url, proxies=proxy)
同时,当您拥有HTTPS URL但当前代理是HTTP代理时,会发生什么?您不会使用代理.因此,您可能想添加一些内容来跳过它们,例如:
if urlparse.urlparse(url).scheme not in proxy:
continue
As it says in the title, I am trying to access a url through several different proxies sequentially (using for loop). Right now this is my code:
import requests
import json
with open('proxies.txt') as proxies:
for line in proxies:
proxy=json.loads(line)
with open('urls.txt') as urls:
for line in urls:
url=line.rstrip()
data=requests.get(url, proxies={'http':line})
data1=data.text
print data1
and my urls.txt file:
http://api.exip.org/?call=ip
and my proxies.txt file:
{"https": "84.22.41.1:3128"}
{"http":"194.126.181.47:81"}
{"http":"218.108.170.170:82"}
that I got at [www.hidemyass.com][1]
for some reason, the output is
68.6.34.253
68.6.34.253
68.6.34.253
as if it is accessing that website through my own router ip address. In other words, it is not trying to access through the proxies I give it, it is just looping through and using my own over and over again. What am I doing wrong?
There are two obvious problems right here:
data=requests.get(url, proxies={'http':line})
First, because you have a for line in urls:
inside the for line in proxies:
, line
is going to be the current URL here, not the current proxy. And besides, even if you weren't reusing line
, it would be the JSON string representation, not the dict you decoded from JSON.
Then, if you fix that to use proxy
, instead of something like {'https': '83.22.41.1:3128'}
, you're passing {'http': {'https': '83.22.41.1:3128'}}
. And that obviously isn't a valid value.
To fix both of those problems, just do this:
data=requests.get(url, proxies=proxy)
Meanwhile, what happens when you have an HTTPS URL, but the current proxy is an HTTP proxy? You're not going to use the proxy. So you probably want to add something to skip over them, like this:
if urlparse.urlparse(url).scheme not in proxy:
continue
这篇关于Python:如何使用请求库通过几个不同的代理服务器访问URL?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!