重复的os.path.isdir调用中发生大量内存泄漏? [英] Huge memory leak in repeated os.path.isdir calls?

查看:77
本文介绍了重复的os.path.isdir调用中发生大量内存泄漏?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我一直在编写与扫描目录有关的脚本,并且在调用os.path.isdir时注意到严重的内存泄漏,因此我尝试了以下代码段:

I've been scripting something that has to do with scanning directories and noticed a severe memory leak when calling os.path.isdir, so I've tried the following snippet:

def func():
    if not os.path.isdir('D:\Downloads'):
        return False
while True:
    func()

几秒钟内,Python进程达到了100MB RAM.

Within a few seconds, the Python process reached 100MB RAM.

我正试图弄清楚发生了什么.似乎只有当该路径确实是有效目录路径时(意味着不执行"return False"),才会发生巨大的内存泄漏. 此外,有趣的是,看看相关调用(例如os.path.isfile)中发生了什么.

I'm trying to figure out what's going on. It seems like the huge memory leak is in effect only when the path is indeed a valid directory path (meaning the 'return False' is not executed). Also, it is interesting to see what happens in related calls, like os.path.isfile.

有想法吗?

修改: 我想我正在做某事. 尽管isfile和isdir在通用路径模块中实现,但在Windows系统上-isdir是从内置nt导入的. 因此,我不得不下载2.7.3源(我应该很久以前就已经完成了...).

I think I'm onto something. Although isfile and isdir are implemented in the genericpath module, on Windows system - isdir is being imported from the builtin nt. So I had to download the 2.7.3 source (which I should've done long time ago...).

经过一番搜索,我在 \ Modules \ posixmodule.c 中发现了 posix__isdir 函数,我假设这是从nt导入的'isdir'函数.

After a little bit of searching, I found out posix__isdir function in \Modules\posixmodule.c, which I assume is the 'isdir' function imported from nt.

该功能(和注释)的这一部分引起了我的注意:

This part of the function (and comment) caught my eye:

if (PyArg_ParseTuple(args, "U|:_isdir", &po)) {
        Py_UNICODE *wpath = PyUnicode_AS_UNICODE(po);

        attributes = GetFileAttributesW(wpath);
        if (attributes == INVALID_FILE_ATTRIBUTES)
            Py_RETURN_FALSE;
        goto check;
    }
    /* Drop the argument parsing error as narrow strings
       are also valid. */
    PyErr_Clear();

似乎全部归结为Unicode/ASCII处理错误.

It seems that it all boils down to Unicode/ASCII handling bug.

我刚刚使用unicode中的path参数尝试了我的代码段(即u'D:\ Downloads')-完全没有内存泄漏.哈哈.

I've just tried my snippet above with path argument in unicode (i.e. u'D:\Downloads') - no memory leak whatsoever. haha.

推荐答案

根本原因是无法在非Unicode路径的path变量上调用PyMem_Free:

The root cause is a failure to call PyMem_Free on the path variable in the non-Unicode path:

    if (!PyArg_ParseTuple(args, "et:_isdir",
                          Py_FileSystemDefaultEncoding, &path))
        return NULL;

    attributes = GetFileAttributesA(path);
    if (attributes == INVALID_FILE_ATTRIBUTES)
        Py_RETURN_FALSE;

check:
    if (attributes & FILE_ATTRIBUTE_DIRECTORY)
        Py_RETURN_TRUE;
    else
        Py_RETURN_FALSE;

根据 PyArg_ParseTuple 上的文档:

As per the documentation on PyArg_ParseTuple:

  • et:与es ...
  • es:PyArg_ParseTuple()将分配所需大小的缓冲区,将编码后的数据复制到该缓冲区中,并调整* buffer以引用新分配的存储. 调用方负责在使用后调用PyMem_Free()释放已分配的缓冲区.
  • .
  • et: Same as es...
  • es: PyArg_ParseTuple() will allocate a buffer of the needed size, copy the encoded data into this buffer and adjust *buffer to reference the newly allocated storage. The caller is responsible for calling PyMem_Free() to free the allocated buffer after use.

这是Python标准库中的一个错误(通过直接使用字节对象在Python 3中进行了修复);在 http://bugs.python.org 上提交错误报告.

It's a bug in Python's standard library (fixed in Python 3 by using bytes objects directly); file a bug report at http://bugs.python.org.

这篇关于重复的os.path.isdir调用中发生大量内存泄漏?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆