如何更快地列出目录? [英] How to list directories faster?
问题描述
我有几种情况需要递归列出文件,但是我的实现速度很慢。我有一个92784文件的目录结构。 find
在不到0.5秒的时间内列出了这些文件,但是我的Haskell实现速度慢了很多。
我的第一个实现花费了9秒多的时间才完成,接下来的版本超过了5秒,而我现在还不到2秒。
c $ c> listFilesR :: FilePath - > IO [FilePath]listFilesR path = let
isDODD。 = False
isDODD..= False
isDODD _ = True
in
allfiles< - getDirectoryContents path
dirs< - forM allfiles $ \d - >
if isDODD d then
do let p = path< /> d
isDir < - doesDirectoryExist p
if isDir then listFilesR p else return [d]
else return []
return $ concat dirs
$ c $测试需要大约100兆字节的内存(+ RTS -s),程序在GC中花费了大约40%。$ / pre
我正在考虑在WriterT monad中使用Sequence作为monoid来阻止连接和列表创建。这有可能帮助吗?我还应该做什么?
编辑:我已经编辑了使用readDirStream的函数,它有助于保持内存不变。仍然有一些分配发生,但生产率现在是> 95%,它运行在不到一秒钟。
这是当前版本:
list path = do
de< - openDirStream path
readDirStream de>> = go de
closeDirStream de
where
go d [] = return()
go d。 = readDirStream d>> = go d
go d..= readDirStream d>> = go d
go d x = let newpath = path< /> x
在do
e < - doesDirectoryExist新路径
如果e
则
列出新路径>> readDirStream d>> = go d
else putStrLn newpath>> readDirStream d>> = go d
System.Directory.getDirectoryContents
构造了一个完整的列表,因此使用了很多内存。如何使用 System.Posix.Directory
? System.Posix.Directory.readDirStream
一个接一个地返回一个条目。
另外, FileManip库可能是有用的,虽然我从来没有使用它。
I have a few situations where I need to list files recursively, but my implementations have been slow. I have a directory structure with 92784 files. find
lists the files in less than 0.5 seconds, but my Haskell implementation is a lot slower.
My first implementation took a bit over 9 seconds to complete, next version a bit over 5 seconds and I'm currently down to a bit less than two seconds.
listFilesR :: FilePath -> IO [FilePath]
listFilesR path = let
isDODD "." = False
isDODD ".." = False
isDODD _ = True
in do
allfiles <- getDirectoryContents path
dirs <- forM allfiles $ \d ->
if isDODD d then
do let p = path </> d
isDir <- doesDirectoryExist p
if isDir then listFilesR p else return [d]
else return []
return $ concat dirs
The test takes about 100 megabytes of memory (+RTS -s), and the program spends around 40% in GC.
I was thinking of doing the listing in a WriterT monad with Sequence as the monoid to prevent the concats and list creation. Is it likely this helps? What else should I do?
Edit: I have edited the function to use readDirStream, and it helps keeping the memory down. There's still some allocation happening, but productivity rate is >95% now and it runs in less than a second.
This is the current version:
list path = do
de <- openDirStream path
readDirStream de >>= go de
closeDirStream de
where
go d [] = return ()
go d "." = readDirStream d >>= go d
go d ".." = readDirStream d >>= go d
go d x = let newpath = path </> x
in do
e <- doesDirectoryExist newpath
if e
then
list newpath >> readDirStream d >>= go d
else putStrLn newpath >> readDirStream d >>= go d
I think that System.Directory.getDirectoryContents
constructs a whole list and therefore uses much memory. How about using System.Posix.Directory
? System.Posix.Directory.readDirStream
returns an entry one by one.
Also, FileManip library might be useful although I have never used it.
这篇关于如何更快地列出目录?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!