强制python mechanize/urllib2仅使用A请求? [英] Force python mechanize/urllib2 to only use A requests?
问题描述
这是一个相关的问题,但是我不知道如何将答案应用于机械化/urllib2:
Here is a related question but I could not figure out how to apply the answer to mechanize/urllib2: how to force python httplib library to use only A requests
基本上,给出以下简单代码:
Basically, given this simple code:
#!/usr/bin/python
import urllib2
print urllib2.urlopen('http://python.org/').read(100)
这导致wireshark说以下内容:
This results in wireshark saying the following:
0.000000 10.102.0.79 -> 8.8.8.8 DNS Standard query A python.org
0.000023 10.102.0.79 -> 8.8.8.8 DNS Standard query AAAA python.org
0.005369 8.8.8.8 -> 10.102.0.79 DNS Standard query response A 82.94.164.162
5.004494 10.102.0.79 -> 8.8.8.8 DNS Standard query A python.org
5.010540 8.8.8.8 -> 10.102.0.79 DNS Standard query response A 82.94.164.162
5.010599 10.102.0.79 -> 8.8.8.8 DNS Standard query AAAA python.org
5.015832 8.8.8.8 -> 10.102.0.79 DNS Standard query response AAAA 2001:888:2000:d::a2
这是 5秒的延迟!
我在系统的任何地方(使用USE=-ipv6
编译的gentoo)都未启用IPv6,因此我认为python没有任何理由尝试IPv6查找.
I don't have IPv6 enabled anywhere in my system (gentoo compiled with USE=-ipv6
) so I don't think that python has any reason to even try an IPv6 lookup.
上面提到的问题建议将套接字类型显式设置为AF_INET
,这听起来不错.我不知道如何强制urllib或机械化使用我创建的任何套接字.
The above referenced question suggested explicitly setting the socket type to AF_INET
which sounds great. I have no idea how to force urllib or mechanize to use any sockets that I create though.
编辑:我知道AAAA查询是一个问题,因为其他应用程序也存在延迟,并且在禁用ipv6的情况下重新编译后,问题就消失了……除了python中的问题仍会执行AAAA请求.
EDIT: I know that the AAAA queries are the issue because other apps had the delay as well and as soon as I recompiled with ipv6 disabled, the problem went away... except for in python which still performs the AAAA requests.
推荐答案
基于J.J. .
这基本上将socket.getaddrinfo(..)
的family
参数强制为socket.AF_INET
,而不是使用socket.AF_UNSPEC
(零,这似乎是在socket.create_connection
中使用的值),不仅用于来自urllib2
的呼叫,而且应该对所有对socket.getaddrinfo(..)
的调用执行此操作:
This basically forces the family
parameter of socket.getaddrinfo(..)
to socket.AF_INET
instead of using socket.AF_UNSPEC
(zero, which is what seems to be used in socket.create_connection
), not only for calls from urllib2
but should do it for all calls to socket.getaddrinfo(..)
:
#--------------------
# do this once at program startup
#--------------------
import socket
origGetAddrInfo = socket.getaddrinfo
def getAddrInfoWrapper(host, port, family=0, socktype=0, proto=0, flags=0):
return origGetAddrInfo(host, port, socket.AF_INET, socktype, proto, flags)
# replace the original socket.getaddrinfo by our version
socket.getaddrinfo = getAddrInfoWrapper
#--------------------
import urllib2
print urllib2.urlopen("http://python.org/").read(100)
至少在这种简单情况下,这对我有用.
This works for me at least in this simple case.
这篇关于强制python mechanize/urllib2仅使用A请求?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!