如何更快地列出目录? [英] How to list directories faster?

查看:156
本文介绍了如何更快地列出目录?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有几种情况需要递归列出文件,但是我的实现速度很慢。我有一个92784文件的目录结构。 find 在不到0.5秒的时间内列出了这些文件,但是我的Haskell实现速度慢了很多。



我的第一个实现花费了9秒多的时间才完成,接下来的版本超过了5秒,而我现在还不到2秒。

c $ c> listFilesR :: FilePath - > IO [FilePath]
listFilesR path = let
isDODD。 = False
isDODD..= False
isDODD _ = True

in
allfiles< - getDirectoryContents path
dirs< - forM allfiles $ \d - >
if isDODD d then
do let p = path< /> d
isDir < - doesDirectoryExist p
if isDir then listFilesR p else return [d]
else return []
return $ concat dirs

我正在考虑在WriterT monad中使用Sequence作为monoid来阻止连接和列表创建。这有可能帮助吗?我还应该做什么?



编辑:我已经编辑了使用readDirStream的函数,它有助于保持内存不变。仍然有一些分配发生,但生产率现在是> 95%,它运行在不到一秒钟。

这是当前版本:

  list path = do 
de< - openDirStream path
readDirStream de>> = go de
closeDirStream de
where
go d [] = return()
go d。 = readDirStream d>> = go d
go d..= readDirStream d>> = go d
go d x = let newpath = path< /> x
在do
e < - doesDirectoryExist新路径
如果e

列出新路径>> readDirStream d>> = go d
else putStrLn newpath>> readDirStream d>> = go d


解决方案

System.Directory.getDirectoryContents 构造了一个完整的列表,因此使用了很多内存。如何使用 System.Posix.Directory System.Posix.Directory.readDirStream 一个接一个地返回一个条目。

另外, FileManip库可能是有用的,虽然我从来没有使用它。


I have a few situations where I need to list files recursively, but my implementations have been slow. I have a directory structure with 92784 files. find lists the files in less than 0.5 seconds, but my Haskell implementation is a lot slower.

My first implementation took a bit over 9 seconds to complete, next version a bit over 5 seconds and I'm currently down to a bit less than two seconds.

listFilesR :: FilePath -> IO [FilePath]
listFilesR path = let
    isDODD "." = False
    isDODD ".." = False
    isDODD _ = True

    in do
        allfiles <- getDirectoryContents path
    dirs <- forM allfiles $ \d ->
      if isDODD d then
        do let p = path </> d
           isDir <- doesDirectoryExist p
           if isDir then listFilesR p else return [d]
        else return []
    return $ concat dirs

The test takes about 100 megabytes of memory (+RTS -s), and the program spends around 40% in GC.

I was thinking of doing the listing in a WriterT monad with Sequence as the monoid to prevent the concats and list creation. Is it likely this helps? What else should I do?

Edit: I have edited the function to use readDirStream, and it helps keeping the memory down. There's still some allocation happening, but productivity rate is >95% now and it runs in less than a second.

This is the current version:

list path = do
  de <- openDirStream path
  readDirStream de >>= go de
  closeDirStream de
  where
    go d [] = return ()
    go d "." = readDirStream d >>= go d
    go d ".." = readDirStream d >>= go d
    go d x = let newpath = path </> x
         in do
          e <- doesDirectoryExist newpath
          if e 
        then
          list newpath >> readDirStream d >>= go d
        else putStrLn newpath >> readDirStream d >>= go d 

解决方案

I think that System.Directory.getDirectoryContents constructs a whole list and therefore uses much memory. How about using System.Posix.Directory? System.Posix.Directory.readDirStream returns an entry one by one.

Also, FileManip library might be useful although I have never used it.

这篇关于如何更快地列出目录?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆