在urllib2中重复的主机查找失败 [英] Repeated host lookups failing in urllib2

查看:145
本文介绍了在urllib2中重复的主机查找失败的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一些代码,它使用Python的urllib2发出许多HTTP GET请求,在几个线程中,将响应写入文件(每个线程一个)。

在执行期间,看起来很多主机查找失败(导致名称或服务未知错误,请参阅附加的错误日志作为示例)。



这是由于一个片状的DNS服务吗?依靠DNS缓存,是不是很糟糕的做法,如果主机名没有改变?即应该将单个查找结果传递给 urlopen

 线程16:
追溯(最近的最后一次调用):
文件/usr/lib/python2.6/threading.py,第532行,__bootstrap_inner
self.run )
文件/home/da/local/bin/ThreadedDownloader.py,第61行,运行
page = urllib2.urlopen(url)#获取页
文件/ usr /lib/python2.6/urllib2.py,第126行,urlopen
return _opener.open(url,data,timeout)
文件/usr/lib/python2.6/urllib2.py ,第391行,打开
response = self._open(req,data)
文件/usr/lib/python2.6/urllib2.py,第409行,在_open
'_open',req)
文件/usr/lib/python2.6/urllib2.py,第369行,_call_chain
result = func(* args)
文件/ usr /lib/python2.6/urllib2.py,第1170行,在http_open
中返回self.do_open(httplib.HTTPConnection,req)
文件/usr/lib/python2.6/urllib2.py ,第1145行,在do_open中
raise URLError(err)
URLError:< urlopen error [Errno -2]名称或服务未知>

更新我的(非常简单)代码

  class AsyncGet(threading.Thread):

def __init __(self,outDir,baseUrl,item ,方法,numPages,numRows,信号量):
threading.Thread .__ init __(self)
self.outDir = outDir
self.baseUrl = baseUrl
self.method = method
self.numPages = numPages
self.numRows = numRows
self.item = item
self.semaphore =信号量

def run(self):
with self.semaphore:#'with'is awesome。
with open(os.path.join(self.outDir,self.item +。xml),'a')as f:
for x in xrange(1,self.numPages + 1) :
url = self.baseUrl + \
method =+ self.method + \
& item =+ self.item + \
& page =+ str(i)+ \
& rows =+ str(self.numRows)+ \
& prettyXML
page = urllib2 urlopen(url)
f.write(page.read())
page.close()#必须记住关闭!

信号量是一个BoundedSemaphore,用于限制正在运行的线程总数。

解决方案

这不是一个Python问题,在Linux系统上确保 nscd(名称服务缓存守护进程)实际上是运行。



更新:
查看您从未调用的代码 page.close()因此泄漏套接字。


I have code which issues many HTTP GET requests using Python's urllib2, in several threads, writing the responses into files (one per thread).
During execution, it looks like many of the host lookups fail (causing a name or service unknown error, see appended error log for an example).

Is this due to a flaky DNS service? Is it bad practice to rely on DNS caching, if the host name isn't changing? I.e. should a single lookup's result be passed into the urlopen?

Exception in thread Thread-16:
Traceback (most recent call last):
  File "/usr/lib/python2.6/threading.py", line 532, in __bootstrap_inner
    self.run()
  File "/home/da/local/bin/ThreadedDownloader.py", line 61, in run
     page = urllib2.urlopen(url) # get the page
  File "/usr/lib/python2.6/urllib2.py", line 126, in urlopen
    return _opener.open(url, data, timeout)
  File "/usr/lib/python2.6/urllib2.py", line 391, in open
    response = self._open(req, data)
  File "/usr/lib/python2.6/urllib2.py", line 409, in _open
    '_open', req)
  File "/usr/lib/python2.6/urllib2.py", line 369, in _call_chain
    result = func(*args)
  File "/usr/lib/python2.6/urllib2.py", line 1170, in http_open
    return self.do_open(httplib.HTTPConnection, req)
  File "/usr/lib/python2.6/urllib2.py", line 1145, in do_open
    raise URLError(err)
URLError: <urlopen error [Errno -2] Name or service not known>

UPDATE my (extremely simple) code

class AsyncGet(threading.Thread):

def __init__(self,outDir,baseUrl,item,method,numPages,numRows,semaphore):
    threading.Thread.__init__(self)
    self.outDir = outDir
    self.baseUrl = baseUrl
    self.method = method
    self.numPages = numPages
    self.numRows = numRows
    self.item = item
    self.semaphore = semaphore

def run(self):
    with self.semaphore: # 'with' is awesome.
        with open( os.path.join(self.outDir,self.item+".xml"), 'a' ) as f:
            for i in xrange(1,self.numPages+1):
                url = self.baseUrl + \
                "method=" + self.method + \
                "&item=" + self.item + \
                "&page=" + str(i) + \
                "&rows=" + str(self.numRows) + \
                "&prettyXML"
                page = urllib2.urlopen(url)
                f.write(page.read())
                page.close() # Must remember to close!

The semaphore is a BoundedSemaphore to constrain the total number of running threads.

解决方案

This is not a Python problem, on Linux systems make sure nscd (Name Service Cache Daemon) is actually running.

UPDATE: And looking at your code you are never calling page.close() hence leaking sockets.

这篇关于在urllib2中重复的主机查找失败的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆