我如何从线程中的每个请求获取新的IP? [英] How i can get new ip from tor every requests in threads?

查看:82
本文介绍了我如何从线程中的每个请求获取新的IP?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我尝试使用TOR代理进行抓取,并且所有内容在一个线程中都可以正常工作,但这很慢. 我尝试做一些简单的事情:

I try to use TOR proxy for scraping and everything works fine in one thread, but this is slow. I try to do something simple:

def get_new_ip():
    with Controller.from_port(port = 9051) as controller:
        controller.authenticate(password="password")
        controller.signal(Signal.NEWNYM)
        time.sleep(controller.get_newnym_wait())


def check_ip():
    get_new_ip()
    session = requests.session()
    session.proxies = {'http': 'socks5h://localhost:9050', 'https': 'socks5h://localhost:9050'}
    r = session.get('http://httpbin.org/ip')
    r.text


with Pool(processes=3) as pool:
    for _ in range(9):
        pool.apply_async(check_ip)
    pool.close()
    pool.join()

当我运行它时,我看到输出:

When I run it, I see the output:

{"origin": "95.179.181.1, 95.179.181.1"}
{"origin": "95.179.181.1, 95.179.181.1"}
{"origin": "95.179.181.1, 95.179.181.1"}
{"origin": "151.80.53.232, 151.80.53.232"}
{"origin": "151.80.53.232, 151.80.53.232"}
{"origin": "151.80.53.232, 151.80.53.232"}
{"origin": "145.239.169.47, 145.239.169.47"}
{"origin": "145.239.169.47, 145.239.169.47"}
{"origin": "145.239.169.47, 145.239.169.47"}

为什么会发生这种情况,如何为每个线程分配自己的IP? 顺便说一下,我尝试了类似TorRequests,TorCtl之类的库,结果是一样的.

Why is this happening and how do I give each thread its own IP? By the way, I tried libraries like TorRequests, TorCtl the result is the same.

我知道TOR似乎在发布新IP之前有延迟,但是为什么同一个IP进入不同的进程?

I understand that it appears that TOR has a delay before issuing a new IP, but why do the same IP get into different processes?

推荐答案

如果每个连接都需要不同的IP,则还可以使用

If you want different IPs for each connection, you can also use Stream Isolation over SOCKS by specifying a different proxy username:password combination for each connection.

使用此方法,您只需要一个Tor实例,并且每个请求客户端都可以使用具有不同退出节点的不同流.

With this method, you only need one Tor instance and each requests client can use a different stream with a different exit node.

要进行设置,请为每个requests.session对象添加唯一的代理凭据,如下所示:socks5h://username:password@localhost:9050

In order to set this up, add unique proxy credentials for each requests.session object like so: socks5h://username:password@localhost:9050

import random
from multiprocessing import Pool
import requests

def check_ip():
    session = requests.session()
    creds = str(random.randint(10000,0x7fffffff)) + ":" + "foobar"
    session.proxies = {'http': 'socks5h://{}@localhost:9050'.format(creds), 'https': 'socks5h://{}@localhost:9050'.format(creds)}
    r = session.get('http://httpbin.org/ip')
    print(r.text)


with Pool(processes=8) as pool:
    for _ in range(9):
        pool.apply_async(check_ip)
    pool.close()
    pool.join()

Tor浏览器通过将凭据设置为firstpartydomain:randompassword来按域隔离流,其中randompassword是每个唯一的第一方域的随机随机数.

Tor Browser isolates streams on a per-domain basis by setting the credentials to firstpartydomain:randompassword, where randompassword is a random nonce for each unique first party domain.

如果您要爬网相同的站点,并且需要随机IP,请为每个会话使用随机的username:password组合.如果要爬网随机域,并希望对域请求使用相同的电路,请使用Tor浏览器的domain:randompassword方法获取凭据.

If you're crawling the same site and you want random IP's, then use a random username:password combination for each session. If you are crawling random domains and want to use the same circuit for requests to a domain, use Tor Browser's method of domain:randompassword for credentials.

这篇关于我如何从线程中的每个请求获取新的IP?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆