Python快速创建和删除目录会间歇性地导致WindowsError【错误5】 [英] Python rapidly creating and removing directories will cause WindowsError [Error 5] intermittently

查看:78
本文介绍了Python快速创建和删除目录会间歇性地导致WindowsError【错误5】的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在使用 Scrapy 的 FifoDiskQueue 时遇到了这个问题.在 Windows 中,FifoDiskQueue 将导致目录和文件由一个文件描述符创建并被另一个文件描述符消耗(如果队列中没有更多消息,则删除).

我会随机收到如下错误消息:

2015-08-25 18:51:30 [scrapy] INFO:处理下载器输出时出错回溯(最近一次调用最后一次):文件C:\Python27\lib\site-packages\twisted\internet\defer.py",第 588 行,在 _runCallbackscurrent.result = callback(current.result, *args, **kw)文件C:\Python27\lib\site-packages\scrapy\core\engine.py",第 154 行,在 _handle_downloader_outputself.crawl(响应,蜘蛛)文件C:\Python27\lib\site-packages\scrapy\core\engine.py",第 182 行,在爬行中self.schedule(请求,蜘蛛)文件C:\Python27\lib\site-packages\scrapy\core\engine.py",第 188 行,按计划如果不是 self.slot.scheduler.enqueue_request(request):文件C:\Python27\lib\site-packages\scrapy\core\scheduler.py",第 54 行,在 enqueue_request 中dqok = self._dqpush(请求)文件C:\Python27\lib\site-packages\scrapy\core\scheduler.py",第 83 行,在 _dqpushself.dqs.push(reqd, -request.priority)文件C:\Python27\lib\site-packages\queuelib\pqueue.py",第 33 行,推送self.queues[priority] = self.qfactory(priority)文件C:\Python27\lib\site-packages\scrapy\core\scheduler.py",第 106 行,在 _newdq返回 self.dqclass(join(self.dqdir, 'p%s' % 优先级))文件C:\Python27\lib\site-packages\queuelib\queue.py",第 43 行,在 __init__ 中os.makedirs(路径)文件C:\Python27\lib\os.py",第 157 行,在 makedirs 中mkdir(名称,模式)WindowsError: [错误 5] : './sogou_job\\requests.queue\\p-50'

在 Windows 中,错误 5 表示访问被拒绝.网络上的很多解释都引用了缺乏管理权限的原因,例如

还证明测试用例在管理员命令提示符下仍然失败:

@pss 说他无法重现这个问题.我尝试了我们的 Windows 7 服务器.我安装了一个全新的 python 2.7.10 64 位.我不得不为回合设置一个非常大的上限,并且在 19963 回合之后才开始看到错误出现:

解决方案

简短:禁用任何防病毒或文档索引,或者至少将它们配置为不扫描您的工作目录.

长时间:您可以花费数月时间尝试解决此类问题,到目前为止,唯一不涉及禁用防病毒软件的解决方法是假设您将无法删除所有文件或目录.

在您的代码中假设这一点,并在服务启动时尝试使用不同的根子目录并尝试清理旧的,忽略删除失败.

I encountered this problem while using Scrapy's FifoDiskQueue. In windows, FifoDiskQueue will cause directories and files to be created by one file descriptor and consumed (and if no more message in the queue, removed) by another file descriptor.

I will get error messages like the following, randomly:

2015-08-25 18:51:30 [scrapy] INFO: Error while handling downloader output
Traceback (most recent call last):
  File "C:\Python27\lib\site-packages\twisted\internet\defer.py", line 588, in _runCallbacks
    current.result = callback(current.result, *args, **kw)
  File "C:\Python27\lib\site-packages\scrapy\core\engine.py", line 154, in _handle_downloader_output
    self.crawl(response, spider)
  File "C:\Python27\lib\site-packages\scrapy\core\engine.py", line 182, in crawl
    self.schedule(request, spider)
  File "C:\Python27\lib\site-packages\scrapy\core\engine.py", line 188, in schedule
    if not self.slot.scheduler.enqueue_request(request):
  File "C:\Python27\lib\site-packages\scrapy\core\scheduler.py", line 54, in enqueue_request
    dqok = self._dqpush(request)
  File "C:\Python27\lib\site-packages\scrapy\core\scheduler.py", line 83, in _dqpush
    self.dqs.push(reqd, -request.priority)
  File "C:\Python27\lib\site-packages\queuelib\pqueue.py", line 33, in push
    self.queues[priority] = self.qfactory(priority)
  File "C:\Python27\lib\site-packages\scrapy\core\scheduler.py", line 106, in _newdq
    return self.dqclass(join(self.dqdir, 'p%s' % priority))
  File "C:\Python27\lib\site-packages\queuelib\queue.py", line 43, in __init__
    os.makedirs(path)
  File "C:\Python27\lib\os.py", line 157, in makedirs
    mkdir(name, mode)
WindowsError: [Error 5] : './sogou_job\\requests.queue\\p-50'

In Windows, Error 5 means access is denied. A lot of explanations on the web quote the reason as lacking administrative rights, like this MSDN post. But the reason is not related to access rights. When I run the scrapy crawl command in a Administrator command prompt, the problem still occurs.

I then created a small test like this to try on windows and linux:

#!/usr/bin/python

import os
import shutil
import time

for i in range(1000):
    somedir = "testingdir"
    try:
        os.makedirs(somedir)
        with open(os.path.join(somedir, "testing.txt"), 'w') as out:
            out.write("Oh no")
        shutil.rmtree(somedir)
    except WindowsError as e:
        print 'round', i, e
        time.sleep(0.1)
        raise

When I run this, I will get:

round 13 [Error 5] : 'testingdir'
Traceback (most recent call last):
  File "E:\FHT360\FHT360_Mobile\Source\keywordranks\test.py", line 10, in <module>
    os.makedirs(somedir)
  File "C:\Users\yj\Anaconda\lib\os.py", line 157, in makedirs
    mkdir(name, mode)
WindowsError: [Error 5] : 'testingdir'

The round is different every time. So if I remove raise in the end, I will get something like this:

round 5 [Error 5] : 'testingdir'
round 67 [Error 5] : 'testingdir'
round 589 [Error 5] : 'testingdir'
round 875 [Error 5] : 'testingdir'

It simply fails randomly, with a small probability, ONLY in Windows. I tried this test script in cygwin and linux, this error never happens there. I also tried the same code in another Windows machine and it occurs there.

What are possible reasons for this?

[Update] Screenshot of proof [管理员 means Administrator in Chinese]:

Also proof that the test case still fails in an administrator command prompt:

@pss said that he couldn't reproduce the issue. I tried our Windows 7 Server. I installed a fresh new python 2.7.10 64-bit. I had to set a really large upper bound for round and only started to see errors appearing after round 19963:

解决方案

Short: disable any antivirus or document indexing or at least configure them not to scan your working directory.

Long: you can spend months trying to fix this kind of problem, so far the only workaround that does not involve disabling the antivirus is to assume that you will not be able to remove all files or directories.

Assume this in your code and try to use a different root subdirectory when the service starts and trying to clean-up the older ones, ignoring the removal failures.

这篇关于Python快速创建和删除目录会间歇性地导致WindowsError【错误5】的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆