将文件夹中的文件列为流,以立即开始处理 [英] List files in a folder as a stream to begin process immediately

查看:106
本文介绍了将文件夹中的文件列为流,以立即开始处理的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述



我想立即开始处理,当在这个文件夹中列出文件时,用Python或其他脚本语言。
$ b

通常的函数(os.listdir在python中...)被阻塞,我的程序必须等待列表的末尾,这可能需要很长时间。



列出大文件夹的最佳方法是什么? 如果方便,改变你的目录结构;但如果没有,您可以使用ctypes调用 opendir readdir

这是该代码的副本;我所做的只是缩进它,添加 try / finally 块,并修复一个错误。您可能需要调试它。特别是结构布局。



请注意,这段代码是不是可移植的。你需要在Windows上使用不同的函数,我认为这个结构在Unix和Unix之间是不同的。


$ b $ $ $ $ $ $ $ $ $#$ / usr / bin / python

等效的os.listdir,但作为使用ctypes的生成器


from ctypes import CDLL,c_char_p,c_int,c_long ,c_ushort,c_byte,c_char,Structure,POINTER
from ctypes.util import find_library
$ b class c_dir(Structure):
目录条目的不透明类型,对应struct DIR
pass
c_dir_p = POINTER(c_dir)
$ b $ class c_dirent(Structure):
目录条目
# FIXME不确定这些是完全正确的类型!
_fields_ =(
('d_ino',c_long),#inode number
('d_off',c_long),#偏移到下一个dirent
('d_reclen',c_ushort ),#这条记录的长度
('d_type',c_byte),#文件类型;不受所有文件系统类型支持
('d_name',c_char * 4096)#文件名

c_dirent_p = POINTER(c_dirent)

c_lib = CDLL(find_library(c))
opendir = c_lib.opendir
opendir.argtypes = [c_char_p]
opendir.restype = c_dir_p

#FIXME应该在这里使用readdir_r
readdir = c_lib.readdir
readdir.argtypes = [c_dir_p]
readdir。 restype = c_dirent_p

closedir = c_lib.closedir
closedir.argtypes = [c_dir_p]
closedir.restype = c_int

def listdir(path):

生成器返回在
中传递的文件名称
dir_p = opendir(path)
try:
而True:
p = readdir(dir_p)
如果不是p:
break
name = p.contents.d_name
如果名称不在(。,..):
产生名称
finally:
closedir(dir_p)

if __name__ ==__main__:
在listdir(。)中的名称:
打印名称


I get a folder with 1 million files in it.

I would like to begin process immediately, when listing files in this folder, in Python or other script langage.

The usual functions (os.listdir in python...) are blocking and my program has to wait the end of the list, which can take a long time.

What's the best way to list huge folders ?

解决方案

If convenient, change your directory structure; but if not, you can use ctypes to call opendir and readdir.

Here is a copy of that code; all I did was indent it properly, add the try/finally block, and fix a bug. You might have to debug it. Particularly the struct layout.

Note that this code is not portable. You would need to use different functions on Windows, and I think the structs vary from Unix to Unix.

#!/usr/bin/python
"""
An equivalent os.listdir but as a generator using ctypes
"""

from ctypes import CDLL, c_char_p, c_int, c_long, c_ushort, c_byte, c_char, Structure, POINTER
from ctypes.util import find_library

class c_dir(Structure):
    """Opaque type for directory entries, corresponds to struct DIR"""
    pass
c_dir_p = POINTER(c_dir)

class c_dirent(Structure):
    """Directory entry"""
    # FIXME not sure these are the exactly correct types!
    _fields_ = (
        ('d_ino', c_long), # inode number
        ('d_off', c_long), # offset to the next dirent
        ('d_reclen', c_ushort), # length of this record
        ('d_type', c_byte), # type of file; not supported by all file system types
        ('d_name', c_char * 4096) # filename
        )
c_dirent_p = POINTER(c_dirent)

c_lib = CDLL(find_library("c"))
opendir = c_lib.opendir
opendir.argtypes = [c_char_p]
opendir.restype = c_dir_p

# FIXME Should probably use readdir_r here
readdir = c_lib.readdir
readdir.argtypes = [c_dir_p]
readdir.restype = c_dirent_p

closedir = c_lib.closedir
closedir.argtypes = [c_dir_p]
closedir.restype = c_int

def listdir(path):
    """
    A generator to return the names of files in the directory passed in
    """
    dir_p = opendir(path)
    try:
        while True:
            p = readdir(dir_p)
            if not p:
                break
            name = p.contents.d_name
            if name not in (".", ".."):
                yield name
    finally:
        closedir(dir_p)

if __name__ == "__main__":
    for name in listdir("."):
        print name

这篇关于将文件夹中的文件列为流,以立即开始处理的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆