TOR 上的 Python urllib? [英] Python urllib over TOR?

查看:41
本文介绍了TOR 上的 Python urllib?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

示例代码:

#!/usr/bin/python
import socks
import socket
import urllib2

socks.setdefaultproxy(socks.PROXY_TYPE_SOCKS4, "127.0.0.1", 9050, True)
socket.socket = socks.socksocket

print urllib2.urlopen("http://almien.co.uk/m/tools/net/ip/").read()

TOR 在端口 9050(默认设置)上运行 SOCKS 代理.该请求通过 TOR,出现在我自己以外的 IP 地址上.但是,TOR 控制台给出了警告:

TOR is running a SOCKS proxy on port 9050 (its default). The request goes through TOR, surfacing at an IP address other than my own. However, TOR console gives the warning:

"Feb 28 22:44:26.233 [警告] 你的应用程序(使用socks4到端口80)只给 Tor 一个 IP 地址.进行 DNS 解析的应用程序自己可能会泄露信息.考虑使用 Socks4A(例如通过privoxy 或 socat)代替.更多信息,请看https://wiki.torproject.org/TheOnionRouter/TorFAQ#SOCKSAndDNS."

"Feb 28 22:44:26.233 [warn] Your application (using socks4 to port 80) is giving Tor only an IP address. Applications that do DNS resolves themselves may leak information. Consider using Socks4A (e.g. via privoxy or socat) instead. For more information, please see https://wiki.torproject.org/TheOnionRouter/TorFAQ#SOCKSAndDNS."

即DNS 查找不通过代理.但这就是 setdefaultproxy 的第四个参数应该做的,对吗?

i.e. DNS lookups aren't going through the proxy. But that's what the 4th parameter to setdefaultproxy is supposed to do, right?

来自http://socksipy.sourceforge.net/readme.txt:

setproxy(proxytype, addr[, port[, rdns[, username[, password]]]])

setproxy(proxytype, addr[, port[, rdns[, username[, password]]]])

rdns - 这是一个布尔标志,而不是修改有关 DNS 的行为解决.如果设置为 True,DNS解决将远程执行,在服务器上.

rdns - This is a boolean flag than modifies the behavior regarding DNS resolving. If it is set to True, DNS resolving will be preformed remotely, on the server.

选择 PROXY_TYPE_SOCKS4 和 PROXY_TYPE_SOCKS5 时效果相同.

Same effect with both PROXY_TYPE_SOCKS4 and PROXY_TYPE_SOCKS5 selected.

它不能是本地 DNS 缓存(如果 urllib2 甚至支持它),因为当我将 URL 更改为这台计算机以前从未访问过的域时,就会发生这种情况.

It can't be a local DNS cache (if urllib2 even supports that) because it happens when I change the URL to a domain that this computer has never visited before.

推荐答案

问题在于 httplib.HTTPConnection 使用了 socket 模块的 create_connection 辅助函数,它通过通常的 getaddrinfo 执行 DNS 请求连接socket之前的方法.

The problem is that httplib.HTTPConnection uses the socket module's create_connection helper function which does the DNS request via the usual getaddrinfo method before connecting the socket.

解决方案是创建自己的create_connection 函数并在导入urllib2 之前将其猴子补丁到socket 模块中,就像我们所做的一样使用 socket 类.

The solution is to make your own create_connection function and monkey-patch it into the socket module before importing urllib2, just like we do with the socket class.

import socks
import socket
def create_connection(address, timeout=None, source_address=None):
    sock = socks.socksocket()
    sock.connect(address)
    return sock

socks.setdefaultproxy(socks.PROXY_TYPE_SOCKS5, "127.0.0.1", 9050)

# patch the socket module
socket.socket = socks.socksocket
socket.create_connection = create_connection

import urllib2

# Now you can go ahead and scrape those shady darknet .onion sites

这篇关于TOR 上的 Python urllib?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆