我如何从线程中的每个请求中获取新的 IP? [英] How i can get new ip from tor every requests in threads?
问题描述
我尝试使用 TOR 代理进行抓取,并且在一个线程中一切正常,但这很慢.我尝试做一些简单的事情:
I try to use TOR proxy for scraping and everything works fine in one thread, but this is slow. I try to do something simple:
def get_new_ip():
with Controller.from_port(port = 9051) as controller:
controller.authenticate(password="password")
controller.signal(Signal.NEWNYM)
time.sleep(controller.get_newnym_wait())
def check_ip():
get_new_ip()
session = requests.session()
session.proxies = {'http': 'socks5h://localhost:9050', 'https': 'socks5h://localhost:9050'}
r = session.get('http://httpbin.org/ip')
r.text
with Pool(processes=3) as pool:
for _ in range(9):
pool.apply_async(check_ip)
pool.close()
pool.join()
当我运行它时,我看到了输出:
When I run it, I see the output:
{"origin": "95.179.181.1, 95.179.181.1"}
{"origin": "95.179.181.1, 95.179.181.1"}
{"origin": "95.179.181.1, 95.179.181.1"}
{"origin": "151.80.53.232, 151.80.53.232"}
{"origin": "151.80.53.232, 151.80.53.232"}
{"origin": "151.80.53.232, 151.80.53.232"}
{"origin": "145.239.169.47, 145.239.169.47"}
{"origin": "145.239.169.47, 145.239.169.47"}
{"origin": "145.239.169.47, 145.239.169.47"}
为什么会发生这种情况?如何为每个线程分配自己的 IP?顺便说一下,我尝试了 TorRequests、TorCtl 之类的库,结果是一样的.
Why is this happening and how do I give each thread its own IP? By the way, I tried libraries like TorRequests, TorCtl the result is the same.
我知道TOR在发布新IP之前似乎有延迟,但是为什么同一个IP会进入不同的进程?
I understand that it appears that TOR has a delay before issuing a new IP, but why do the same IP get into different processes?
推荐答案
如果你希望每个连接使用不同的 IP,你也可以使用 通过为每个连接指定不同的代理 username:password
组合,通过 SOCKS 实现流隔离.
If you want different IPs for each connection, you can also use Stream Isolation over SOCKS by specifying a different proxy username:password
combination for each connection.
使用这种方法,您只需要一个 Tor 实例,并且每个请求客户端可以使用具有不同出口节点的不同流.
With this method, you only need one Tor instance and each requests client can use a different stream with a different exit node.
为了进行设置,为每个 requests.session
对象添加唯一的代理凭据,如下所示:socks5h://username:password@localhost:9050
In order to set this up, add unique proxy credentials for each requests.session
object like so: socks5h://username:password@localhost:9050
import random
from multiprocessing import Pool
import requests
def check_ip():
session = requests.session()
creds = str(random.randint(10000,0x7fffffff)) + ":" + "foobar"
session.proxies = {'http': 'socks5h://{}@localhost:9050'.format(creds), 'https': 'socks5h://{}@localhost:9050'.format(creds)}
r = session.get('http://httpbin.org/ip')
print(r.text)
with Pool(processes=8) as pool:
for _ in range(9):
pool.apply_async(check_ip)
pool.close()
pool.join()
Tor 浏览器通过将凭据设置为 firstpartydomain:randompassword
来隔离每个域的流,其中 randompassword 是每个唯一第一方域的随机随机数.
Tor Browser isolates streams on a per-domain basis by setting the credentials to firstpartydomain:randompassword
, where randompassword is a random nonce for each unique first party domain.
如果您正在爬取同一个站点并且想要随机 IP,则为每个会话使用随机用户名:密码组合.如果您正在抓取随机域并希望对域的请求使用相同的电路,请使用 Tor 浏览器的 domain:randompassword
方法获取凭据.
If you're crawling the same site and you want random IP's, then use a random username:password combination for each session. If you are crawling random domains and want to use the same circuit for requests to a domain, use Tor Browser's method of domain:randompassword
for credentials.
这篇关于我如何从线程中的每个请求中获取新的 IP?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!