Python:如何使用请求库通过几个不同的代理服务器访问URL? [英] Python: How to use requests library to access a url through several different proxy servers?

查看:151
本文介绍了Python:如何使用请求库通过几个不同的代理服务器访问URL?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

正如标题中所述,我正在尝试依次通过几个不同的代理访问url(使用for循环).现在,这是我的代码:

import requests
import json
with open('proxies.txt') as proxies:
    for line in proxies:
        proxy=json.loads(line)
        with open('urls.txt') as urls:
        for line in urls:
            url=line.rstrip()
            data=requests.get(url, proxies={'http':line})
            data1=data.text
            print data1

和我的urls.txt文件:

http://api.exip.org/?call=ip

和我的proxies.txt文件:

{"https": "84.22.41.1:3128"}
{"http":"194.126.181.47:81"}
{"http":"218.108.170.170:82"}

我在[www.hidemyass.com] [1]上获得的

由于某种原因,输出为

68.6.34.253
68.6.34.253
68.6.34.253

好像它是通过我自己的路由器ip地址访问该网站一样.换句话说,它不是在尝试通过我提供的代理进行访问,它只是循环遍历并反复使用我自己的代理.我在做什么错了?

解决方案

这里有两个明显的问题:

data=requests.get(url, proxies={'http':line})

首先,由于您在for line in proxies:内有一个for line in urls:,因此line将成为此处的当前 URL ,而不是当前的代理.而且,即使您不重用line,它也将是JSON字符串表示形式,而不是您从JSON解码的字典.

然后,如果您将其修复为使用proxy,而不是{'https': '83.22.41.1:3128'}之类的东西,那么您将传递{'http': {'https': '83.22.41.1:3128'}}.显然这不是一个有效值.

要解决这两个问题,只需执行以下操作:

data=requests.get(url, proxies=proxy)


同时,当您拥有HTTPS URL但当前代理是HTTP代理时,会发生什么?您不会使用代理.因此,您可能想添加一些内容来跳过它们,例如:

if urlparse.urlparse(url).scheme not in proxy:
    continue

As it says in the title, I am trying to access a url through several different proxies sequentially (using for loop). Right now this is my code:

import requests
import json
with open('proxies.txt') as proxies:
    for line in proxies:
        proxy=json.loads(line)
        with open('urls.txt') as urls:
        for line in urls:
            url=line.rstrip()
            data=requests.get(url, proxies={'http':line})
            data1=data.text
            print data1

and my urls.txt file:

http://api.exip.org/?call=ip

and my proxies.txt file:

{"https": "84.22.41.1:3128"}
{"http":"194.126.181.47:81"}
{"http":"218.108.170.170:82"}

that I got at [www.hidemyass.com][1]

for some reason, the output is

68.6.34.253
68.6.34.253
68.6.34.253

as if it is accessing that website through my own router ip address. In other words, it is not trying to access through the proxies I give it, it is just looping through and using my own over and over again. What am I doing wrong?

解决方案

There are two obvious problems right here:

data=requests.get(url, proxies={'http':line})

First, because you have a for line in urls: inside the for line in proxies:, line is going to be the current URL here, not the current proxy. And besides, even if you weren't reusing line, it would be the JSON string representation, not the dict you decoded from JSON.

Then, if you fix that to use proxy, instead of something like {'https': '83.22.41.1:3128'}, you're passing {'http': {'https': '83.22.41.1:3128'}}. And that obviously isn't a valid value.

To fix both of those problems, just do this:

data=requests.get(url, proxies=proxy)


Meanwhile, what happens when you have an HTTPS URL, but the current proxy is an HTTP proxy? You're not going to use the proxy. So you probably want to add something to skip over them, like this:

if urlparse.urlparse(url).scheme not in proxy:
    continue

这篇关于Python:如何使用请求库通过几个不同的代理服务器访问URL?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆