在 urllib2 中重复主机查找失败 [英] Repeated host lookups failing in urllib2

查看：43 发布时间：2022/1/4 14:17:30 python multithreading http dns urllib2

本文介绍了在 urllib2 中重复主机查找失败的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我的代码使用 Python 的 urllib2 在多个线程中发出许多 HTTP GET 请求，将响应写入文件(每个线程一个).
在执行期间，看起来许多主机查找失败(导致名称或服务未知错误，请参阅附加的错误日志以获取示例).

I have code which issues many HTTP GET requests using Python's urllib2, in several threads, writing the responses into files (one per thread).
During execution, it looks like many of the host lookups fail (causing a name or service unknown error, see appended error log for an example).

这是由于不稳定的 DNS 服务吗?如果主机名没有改变，依赖 DNS 缓存是不好的做法吗?IE.是否应该将单个查找的结果传递到 urlopen 中?

Is this due to a flaky DNS service? Is it bad practice to rely on DNS caching, if the host name isn't changing? I.e. should a single lookup's result be passed into the urlopen?

Exception in thread Thread-16:
Traceback (most recent call last):
  File "/usr/lib/python2.6/threading.py", line 532, in __bootstrap_inner
    self.run()
  File "/home/da/local/bin/ThreadedDownloader.py", line 61, in run
     page = urllib2.urlopen(url) # get the page
  File "/usr/lib/python2.6/urllib2.py", line 126, in urlopen
    return _opener.open(url, data, timeout)
  File "/usr/lib/python2.6/urllib2.py", line 391, in open
    response = self._open(req, data)
  File "/usr/lib/python2.6/urllib2.py", line 409, in _open
    '_open', req)
  File "/usr/lib/python2.6/urllib2.py", line 369, in _call_chain
    result = func(*args)
  File "/usr/lib/python2.6/urllib2.py", line 1170, in http_open
    return self.do_open(httplib.HTTPConnection, req)
  File "/usr/lib/python2.6/urllib2.py", line 1145, in do_open
    raise URLError(err)
URLError: <urlopen error [Errno -2] Name or service not known>

更新 我的(非常简单的)代码

class AsyncGet(threading.Thread):

def __init__(self,outDir,baseUrl,item,method,numPages,numRows,semaphore):
    threading.Thread.__init__(self)
    self.outDir = outDir
    self.baseUrl = baseUrl
    self.method = method
    self.numPages = numPages
    self.numRows = numRows
    self.item = item
    self.semaphore = semaphore

def run(self):
    with self.semaphore: # 'with' is awesome.
        with open( os.path.join(self.outDir,self.item+".xml"), 'a' ) as f:
            for i in xrange(1,self.numPages+1):
                url = self.baseUrl + 
                "method=" + self.method + 
                "&item=" + self.item + 
                "&page=" + str(i) + 
                "&rows=" + str(self.numRows) + 
                "&prettyXML"
                page = urllib2.urlopen(url)
                f.write(page.read())
                page.close() # Must remember to close!

信号量是一个有界信号量，用于限制正在运行的线程总数.

The semaphore is a BoundedSemaphore to constrain the total number of running threads.

在 urllib2 中重复主机查找失败 [英] Repeated host lookups failing in urllib2

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

在 urllib2 中重复主机查找失败 [英] Repeated host lookups failing in urllib2

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭