将文件夹中的文件作为流列出以立即开始处理 [英] List files in a folder as a stream to begin process immediately

查看:18
本文介绍了将文件夹中的文件作为流列出以立即开始处理的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个文件夹,里面有 100 万个文件.

I get a folder with 1 million files in it.

当列出此文件夹中的文件时,我想立即开始处理,使用 Python 或其他脚本语言.

I would like to begin process immediately, when listing files in this folder, in Python or other script langage.

通常的函数(python 中的 os.listdir...)被阻塞,我的程序必须等待列表的末尾,这可能需要很长时间.

The usual functions (os.listdir in python...) are blocking and my program has to wait the end of the list, which can take a long time.

列出大文件夹的最佳方法是什么?

What's the best way to list huge folders ?

推荐答案

如果方便,改变你的目录结构;但如果没有,你可以使用ctypes调用 opendirreaddir.

If convenient, change your directory structure; but if not, you can use ctypes to call opendir and readdir.

这是该代码的副本;我所做的就是正确缩进它,添加 try/finally 块,然后修复一个错误.您可能需要调试它.特别是结构布局.

Here is a copy of that code; all I did was indent it properly, add the try/finally block, and fix a bug. You might have to debug it. Particularly the struct layout.

请注意,此代码不可可移植.你需要在 Windows 上使用不同的函数,我认为结构因 Unix 而异.

Note that this code is not portable. You would need to use different functions on Windows, and I think the structs vary from Unix to Unix.

#!/usr/bin/python
"""
An equivalent os.listdir but as a generator using ctypes
"""

from ctypes import CDLL, c_char_p, c_int, c_long, c_ushort, c_byte, c_char, Structure, POINTER
from ctypes.util import find_library

class c_dir(Structure):
    """Opaque type for directory entries, corresponds to struct DIR"""
    pass
c_dir_p = POINTER(c_dir)

class c_dirent(Structure):
    """Directory entry"""
    # FIXME not sure these are the exactly correct types!
    _fields_ = (
        ('d_ino', c_long), # inode number
        ('d_off', c_long), # offset to the next dirent
        ('d_reclen', c_ushort), # length of this record
        ('d_type', c_byte), # type of file; not supported by all file system types
        ('d_name', c_char * 4096) # filename
        )
c_dirent_p = POINTER(c_dirent)

c_lib = CDLL(find_library("c"))
opendir = c_lib.opendir
opendir.argtypes = [c_char_p]
opendir.restype = c_dir_p

# FIXME Should probably use readdir_r here
readdir = c_lib.readdir
readdir.argtypes = [c_dir_p]
readdir.restype = c_dirent_p

closedir = c_lib.closedir
closedir.argtypes = [c_dir_p]
closedir.restype = c_int

def listdir(path):
    """
    A generator to return the names of files in the directory passed in
    """
    dir_p = opendir(path)
    try:
        while True:
            p = readdir(dir_p)
            if not p:
                break
            name = p.contents.d_name
            if name not in (".", ".."):
                yield name
    finally:
        closedir(dir_p)

if __name__ == "__main__":
    for name in listdir("."):
        print name

这篇关于将文件夹中的文件作为流列出以立即开始处理的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆