在scrapy中使用Tor代理 [英] Using Tor proxy with scrapy
本文介绍了在scrapy中使用Tor代理的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我需要帮助在 Ubuntu 中设置 Tor 并在scrapy 框架中使用它.
I need help setting up Tor in Ubuntu and to use it within scrapy framework.
我做了一些研究,发现了这个指南:
I did some research and found out this guide:
class RetryChangeProxyMiddleware(RetryMiddleware):
def _retry(self, request, reason, spider):
log.msg('Changing proxy')
tn = telnetlib.Telnet('127.0.0.1', 9051)
tn.read_until("Escape character is '^]'.", 2)
tn.write('AUTHENTICATE "267765"\r\n')
tn.read_until("250 OK", 2)
tn.write("signal NEWNYM\r\n")
tn.read_until("250 OK", 2)
tn.write("quit\r\n")
tn.close()
time.sleep(3)
log.msg('Proxy changed')
return RetryMiddleware._retry(self, request, reason, spider)
然后在 settings.py 中使用它:
then use it in settings.py:
DOWNLOADER_MIDDLEWARE = {
'spider.middlewares.RetryChangeProxyMiddleware': 600,
}
然后您只想通过本地 Tor 代理 (polipo) 发送请求,可以通过以下方式完成:
and then you just want to send requests through local tor proxy (polipo) which could be done with:
tsocks scrapy crawl spirder
有人可以确认,这种方法有效并且您获得不同的 IP 吗?
does anyone can confirm, that this method works and you get different IPs?
推荐答案
我使用了这个片段:http://snipplr.com/view/66992/use-a-random-user-agent-for-each-request/
更新:损坏的链接已修复
Update: broken link fixed
这篇关于在scrapy中使用Tor代理的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文