Python快速创建和删除目录会间歇性地导致WindowsError【错误5】 [英] Python rapidly creating and removing directories will cause WindowsError [Error 5] intermittently
问题描述
我在使用 Scrapy 的 FifoDiskQueue
时遇到了这个问题.在 Windows 中,FifoDiskQueue
将导致目录和文件由一个文件描述符创建并被另一个文件描述符消耗(如果队列中没有更多消息,则删除).
我会随机收到如下错误消息:
2015-08-25 18:51:30 [scrapy] INFO:处理下载器输出时出错回溯(最近一次调用最后一次):文件C:\Python27\lib\site-packages\twisted\internet\defer.py",第 588 行,在 _runCallbackscurrent.result = callback(current.result, *args, **kw)文件C:\Python27\lib\site-packages\scrapy\core\engine.py",第 154 行,在 _handle_downloader_outputself.crawl(响应,蜘蛛)文件C:\Python27\lib\site-packages\scrapy\core\engine.py",第 182 行,在爬行中self.schedule(请求,蜘蛛)文件C:\Python27\lib\site-packages\scrapy\core\engine.py",第 188 行,按计划如果不是 self.slot.scheduler.enqueue_request(request):文件C:\Python27\lib\site-packages\scrapy\core\scheduler.py",第 54 行,在 enqueue_request 中dqok = self._dqpush(请求)文件C:\Python27\lib\site-packages\scrapy\core\scheduler.py",第 83 行,在 _dqpushself.dqs.push(reqd, -request.priority)文件C:\Python27\lib\site-packages\queuelib\pqueue.py",第 33 行,推送self.queues[priority] = self.qfactory(priority)文件C:\Python27\lib\site-packages\scrapy\core\scheduler.py",第 106 行,在 _newdq返回 self.dqclass(join(self.dqdir, 'p%s' % 优先级))文件C:\Python27\lib\site-packages\queuelib\queue.py",第 43 行,在 __init__ 中os.makedirs(路径)文件C:\Python27\lib\os.py",第 157 行,在 makedirs 中mkdir(名称,模式)WindowsError: [错误 5] : './sogou_job\\requests.queue\\p-50'
在 Windows 中,错误 5 表示访问被拒绝.网络上的很多解释都引用了缺乏管理权限的原因,例如
还证明测试用例在管理员命令提示符下仍然失败:
@pss 说他无法重现这个问题.我尝试了我们的 Windows 7 服务器.我安装了一个全新的 python 2.7.10 64 位.我不得不为回合设置一个非常大的上限,并且在 19963 回合之后才开始看到错误出现:
简短:禁用任何防病毒或文档索引,或者至少将它们配置为不扫描您的工作目录.
长时间:您可以花费数月时间尝试解决此类问题,到目前为止,唯一不涉及禁用防病毒软件的解决方法是假设您将无法删除所有文件或目录.
在您的代码中假设这一点,并在服务启动时尝试使用不同的根子目录并尝试清理旧的,忽略删除失败.
I encountered this problem while using Scrapy's FifoDiskQueue
. In windows, FifoDiskQueue
will cause directories and files to be created by one file descriptor and consumed (and if no more message in the queue, removed) by another file descriptor.
I will get error messages like the following, randomly:
2015-08-25 18:51:30 [scrapy] INFO: Error while handling downloader output
Traceback (most recent call last):
File "C:\Python27\lib\site-packages\twisted\internet\defer.py", line 588, in _runCallbacks
current.result = callback(current.result, *args, **kw)
File "C:\Python27\lib\site-packages\scrapy\core\engine.py", line 154, in _handle_downloader_output
self.crawl(response, spider)
File "C:\Python27\lib\site-packages\scrapy\core\engine.py", line 182, in crawl
self.schedule(request, spider)
File "C:\Python27\lib\site-packages\scrapy\core\engine.py", line 188, in schedule
if not self.slot.scheduler.enqueue_request(request):
File "C:\Python27\lib\site-packages\scrapy\core\scheduler.py", line 54, in enqueue_request
dqok = self._dqpush(request)
File "C:\Python27\lib\site-packages\scrapy\core\scheduler.py", line 83, in _dqpush
self.dqs.push(reqd, -request.priority)
File "C:\Python27\lib\site-packages\queuelib\pqueue.py", line 33, in push
self.queues[priority] = self.qfactory(priority)
File "C:\Python27\lib\site-packages\scrapy\core\scheduler.py", line 106, in _newdq
return self.dqclass(join(self.dqdir, 'p%s' % priority))
File "C:\Python27\lib\site-packages\queuelib\queue.py", line 43, in __init__
os.makedirs(path)
File "C:\Python27\lib\os.py", line 157, in makedirs
mkdir(name, mode)
WindowsError: [Error 5] : './sogou_job\\requests.queue\\p-50'
In Windows, Error 5 means access is denied. A lot of explanations on the web quote the reason as lacking administrative rights, like this MSDN post. But the reason is not related to access rights. When I run the scrapy crawl
command in a Administrator command prompt
, the problem still occurs.
I then created a small test like this to try on windows and linux:
#!/usr/bin/python
import os
import shutil
import time
for i in range(1000):
somedir = "testingdir"
try:
os.makedirs(somedir)
with open(os.path.join(somedir, "testing.txt"), 'w') as out:
out.write("Oh no")
shutil.rmtree(somedir)
except WindowsError as e:
print 'round', i, e
time.sleep(0.1)
raise
When I run this, I will get:
round 13 [Error 5] : 'testingdir'
Traceback (most recent call last):
File "E:\FHT360\FHT360_Mobile\Source\keywordranks\test.py", line 10, in <module>
os.makedirs(somedir)
File "C:\Users\yj\Anaconda\lib\os.py", line 157, in makedirs
mkdir(name, mode)
WindowsError: [Error 5] : 'testingdir'
The round
is different every time. So if I remove raise
in the end, I will get something like this:
round 5 [Error 5] : 'testingdir'
round 67 [Error 5] : 'testingdir'
round 589 [Error 5] : 'testingdir'
round 875 [Error 5] : 'testingdir'
It simply fails randomly, with a small probability, ONLY in Windows. I tried this test script in cygwin and linux, this error never happens there. I also tried the same code in another Windows machine and it occurs there.
What are possible reasons for this?
[Update] Screenshot of proof [管理员 means Administrator in Chinese]:
Also proof that the test case still fails in an administrator command prompt:
@pss said that he couldn't reproduce the issue. I tried our Windows 7 Server. I installed a fresh new python 2.7.10 64-bit. I had to set a really large upper bound for round and only started to see errors appearing after round 19963:
Short: disable any antivirus or document indexing or at least configure them not to scan your working directory.
Long: you can spend months trying to fix this kind of problem, so far the only workaround that does not involve disabling the antivirus is to assume that you will not be able to remove all files or directories.
Assume this in your code and try to use a different root subdirectory when the service starts and trying to clean-up the older ones, ignoring the removal failures.
这篇关于Python快速创建和删除目录会间歇性地导致WindowsError【错误5】的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!