按广度优先顺序列出目录的所有内容会导致效率低下 [英] Listing all the contents of a directory by breadth-first order results in low efficiency

查看:150
本文介绍了按广度优先顺序列出目录的所有内容会导致效率低下的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我写了一个Haskell模块,按宽度优先顺序列出目录的所有内容。以下是源代码。

I writed a Haskell module to list all the contents of a directory by breadth-first order. The below is the source code.

module DirElements (dirElem) where

import System.Directory (getDirectoryContents, doesDirectoryExist)
import System.FilePath ((</>))

dirElem :: FilePath -> IO [[FilePath]]
dirElem dirPath = iterateM (not.null) (concatMapM getDirectoryContents') [dirPath] >>= return.tail

getDirectoryContents' :: FilePath -> IO [FilePath]
getDirectoryContents' dirPath = do
  isDir <- do doesDirectoryExist dirPath
  if isDir then dirContent else return [] where
    dirContent = do
      contents <- getDirectoryContents dirPath
      return.(map (dirPath</>)).tail.tail $ contents

iterateM :: (Monad m) => (a -> Bool) -> (a -> m a) -> a -> m [a]
iterateM fb f x = do --Notice: Due to the the implementation of >>=, iterateM can't be writen like iterate which gives a infinite list and have type of iterateM :: (Monad m) => (a -> Bool) -> (a -> m a) -> a -> m [a]
  if fb x
    then do
      tail <- do {fx <- f x; iterateM fb f fx}
      return (x:tail)
    else return []

concatMapM :: Monad m => (a -> m[b]) -> [a] -> m[b]
concatMapM f list = mapM f list >>= return.concat

它工作正常,但在大型目录上执行时,它会暂停暂停一段时间,然后弹出所有结果。

It works correct but when performing on a large directory, it will "suspend" for a little while, and spring out all the results.

经过研究后我发现它与序列$ map return [1 ..] :: [[Int]] 请参阅为什么Haskell序列函数不能延迟或为什么递归monadic函数不能是懒惰的?

After a research I find it is the same question with sequence $ map return [1..]::[[Int]] see Why the Haskell sequence function can't be lazy or why recursive monadic functions can't be lazy

推荐答案

我修改了Davorak链接的旧答案,使用新的管道库。

I modified the older answer that Davorak linked to to use the new pipes library.

它使用 StateP 保留未遍历目录的队列,以便它可以进行广度优先遍历。为方便起见,它使用 MaybeP 退出循环。

It uses StateP to keep a queue of untraversed directories so that it can do a breadth first traversal. It uses MaybeP for exiting from the loop, as a convenience.

import Control.Monad
import Control.Proxy
import Control.Proxy.Trans.Maybe
import Control.Proxy.Trans.State as S
import Data.Sequence hiding (filter)
import System.FilePath.Posix
import System.Directory

getUsefulContents :: FilePath -> IO [FilePath]
getUsefulContents path
  = fmap (filter (`notElem` [".", ".."])) $ getDirectoryContents path

traverseTree
    :: (Proxy p)
    => FilePath
    -> () -> Producer (MaybeP (StateP (Seq FilePath) p)) FilePath IO r
traverseTree path () = do
    liftP $ S.modify (|> path)
    forever $ do
        x <- liftP $ S.gets viewl
        case x of
            EmptyL    -> mzero
            file :< s -> do
                liftP $ S.put s
                respond file
                p <- lift $ doesDirectoryExist file
                when p $ do
                    names <- lift $ getUsefulContents file
                    let namesfull = map (file </>) names
                    liftP $ forM_ namesfull $ \name ->
                        S.modify (|> name)

这定义了一个广度优先的懒惰生产者的文件。如果你将它连接到打印阶段,它将在遍历树时打印出文件:

This defines a breadth-first lazy producer of files. If you hook it up to a printing stage, it will print out the files as it traverses the tree:

main = runProxy $ evalStateK empty $ runMaybeK $
    traverseTree "/tmp" >-> putStrLnD

懒惰意味着如果您只需要3个文件,它只会根据需要遍历树要生成三个文件,它将停止:

Laziness means that if you only demand 3 files, it will only traverse the tree as much as necessary to generate three files, then it will stop:

    main = runProxy $ evalStateK empty $ runMaybeK $
        traverseTree "/tmp" >-> takeB_ 3 >-> putStrLnD

如果您想了解更多关于 pipes library ,然后我建议你阅读教程

If you want to learn more about the pipes library, then I recommend you read the tutorial.

这篇关于按广度优先顺序列出目录的所有内容会导致效率低下的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆