当目录很大时,使用Python列出目录中的文件 [英] Listing files in a directory with Python when the directory is huge

查看:112
本文介绍了当目录很大时,使用Python列出目录中的文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试处理Python中的许多文件.我首先需要获得一个目录中所有文件的列表.目前,我正在使用:

I'm trying to deal with many files in Python. I first need to get a list of all the files in a single directory. At the moment, I'm using:

os.listdir(dir)

但是.这是不可行的,因为我正在搜索的目录中最多有81,000个文件,总计近5 GB.

However. This isn't feasible since the directory I'm searching has upward of 81,000 files in it, and totals almost 5 Gigabytes.

逐个浏览每个文件的最佳方法是什么?如果Windows无法确定Python进程没有响应并杀死它?因为这很容易发生.

What's the best way of stepping through each file one-by-one? Without Windows deciding that the Python process is not responding and killing it? Because that tends to happen.

它正在32位Windows XP计算机上运行,​​因此很明显它不能索引超过4 GB的RAM.

It's being run on a 32-bit Windows XP machine, so clearly it can't index more than 4 GB of RAM.

还有其他任何想法可以解决这个问题吗?

Any other ideas form anyone to solve this problem?

推荐答案

您可能想尝试使用 scandir 模块:

You may want to try using the scandir module:

scandir是提供os.listdir()生成器版本的模块 这也将额外的文件信息公开给操作系统 迭代目录时返回. scandir还提供了很多 os.walk()的更快版本,因为它可以使用额外的文件 scandir()功能公开的信息.

scandir is a module which provides a generator version of os.listdir() that also exposes the extra file information the operating system returns when you iterate a directory. scandir also provides a much faster version of os.walk(), because it can use the extra file information exposed by the scandir() function.

有一个接受的PEP 建议将其合并到Python标准库中,因此似乎具有吸引力.

There's an accepted PEP proposing to merge it into the Python standard library, so it seems to have some traction.

他们文档中的简单用法示例:

Simple usage example from their docs:

def subdirs(path):
    """Yield directory names not starting with '.' under given path."""
    for entry in os.scandir(path):
        if not entry.name.startswith('.') and entry.is_dir():
            yield entry.name

这篇关于当目录很大时,使用Python列出目录中的文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆