目录树的广度优先遍历不是懒惰的 [英] breadth-first traversal of directory tree is not lazy
问题描述
我试图遍历目录树。幼稚的深度优先遍历似乎不会以懒惰的方式产生数据并且耗尽内存。我接下来尝试了第一种宽度的方法,它显示了同样的问题 - 它使用所有可用的内存然后崩溃。
我的代码是:
getFilePathBreadtFirst :: FilePath - > IO [FilePath]
getFilePathBreadtFirst fp = do
fileinfo< - getInfo fp
res :: [FilePath]< - if isReadableDirectory fileinfo
then do
children< ; - getChildren fp
lower < - mapM getFilePathBreadtFirst children
return(children ++ concat lower)
else return [fp] - 应该只返回文件吗?
返回资源
getChildren :: FilePath - > IO [FilePath]
getChildren path = do
名称< - getUsefulContents路径
let namesfull = map(路径< />)名称
返回namesfull
testBF fn = do - 崩溃/ home / frank,不去掉
fps< - getFilePathBreadtFirst fn
putStrLn $ unlines fps
我认为所有的代码都是线性的或者是尾递归的,我期望文件名列表立即开始,但实际上并不是。我的代码中的错误和我的想法在哪里?我在哪里失去了懒惰的评价?
我会用三个独立的技巧来解决你的问题。 b 以下代码将这三种技巧结合在一个monad变量堆栈中。 这将创建一个可以消费的广度优先文件名与树遍历同时进行。您可以使用以下值来使用这些值: 您甚至可以选择不要求所有的值: 更重要的是,最后一个例子只会根据需要遍历树来生成三个文件,那么它会停止。这可以防止浪费地遍历整个树,当你想要的只有3个结果! 了解更多关于 要详细了解循环技巧,请阅读博客文章。 我无法为广度优先遍历的队列技巧找到一个好的链接,但我知道它在某处。如果其他人知道这个好链接,只需编辑我的答案来添加它。 I try to traverse the directory tree. A naive depth-first traversal seems not to produce the data in a lazy fashion and runs out of memory. I next tried a breadth first approach, which shows the same problem - it uses all the memory available and then crashes. the code I have is: I think all the code is either linear or tail recursive, and I would expect that the listing of filenames starts immediately, but in fact it does not. Where is the error in my code and my thinking? Where have I lost lazy evaluation? I will use three separate tricks to solve your question. The following code combines these three tricks in one monad transformer stack. This creates a generator of breadth-first file names that can be consumed concurrent with the tree traversal. You consume the values using: You can even choose to not demand all the values: More importantly, that last example will only traverse the tree as much as necessary to generate the three files and then it will stop. This prevents wastefully traversing the entire tree when all you wanted was 3 results! To learn more about the To learn more about the loop trick, read this blog post. I couldn't find a good link for the queue trick for breadth first traversal, but I know it's out there somewhere. If somebody else knows a good link for this, just edit my answer to add it. 这篇关于目录树的广度优先遍历不是懒惰的的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
管道
MaybeT
变换器来避免手动递归, 。
import Control.Monad
import Control.Monad.Trans
import Control.Monad.Trans.Maybe
import Control.Monad.State.Lazy
import Control.Pipe
import Data.Sequence
import System.FilePath.Posix
import System.Directory
loop ::(Monad m)=> ; MaybeT m a - > m()
loop = liftM(maybe()id)。 runMaybeT。永远
quit ::(Monad m)=> MaybeT m a
quit = mzero
getUsefulContents :: FilePath - > IO [FilePath]
getUsefulContents路径
= fmap(filter(`notElem` [。,..]))$ getDirectoryContents路径
允许:: FilePath - > IO Bool
允许的文件
= fmap(\ p - >可读的p&&&搜索p)$ getPermissions文件
traverseTree :: FilePath - >生产者FilePath IO()
traverseTree path =(``evalStateT` empty)$ loop $ do
- 过去这一点的所有代码都使用以下monad变量堆栈:
- MaybeT(StateT Seq FilePath)(Producer FilePath IO))()
let liftState = lift
liftPipe = lift。提升
liftIO =提升。电梯 。 lift
liftState $ modify(|>路径)
永远$ do
x< - liftState $获取viewl
case x of
EmptyL - >退出
文件:< s - > do
liftState $ put s
liftPipe $ yield文件
p< - liftIO $ doesDirectoryExist文件
当p $ do
名称< - liftIO $ getUsefulContents文件
- allowedNames< - filterM允许的名称
let namesfull = map(路径< />)名称
liftState $ forM_ namesfull $ \\\
ame - >修改(|>名称)
printer ::(Show a)=>消费者a IO r
printer = forever $ do
a< - 等待
lift $打印
>>> runPipe $ printer< +<遍历树路径
<在遍历树时打印文件名称>
- 仅需'n'元素
以'::(Monad m)=> Int - > Pipe a a m()
take'n = replicateM_ n $ do
a< - await
yield a
>> runPipe $ printer< +<采取'3 +<遍历树路径
<仅打印三个文件>
管道
库请参阅管道教程 at Control.Pipes.Tutorial
。
getFilePathBreadtFirst :: FilePath -> IO [FilePath]
getFilePathBreadtFirst fp = do
fileinfo <- getInfo fp
res :: [FilePath] <- if isReadableDirectory fileinfo
then do
children <- getChildren fp
lower <- mapM getFilePathBreadtFirst children
return (children ++ concat lower)
else return [fp] -- should only return the files?
return res
getChildren :: FilePath -> IO [FilePath]
getChildren path = do
names <- getUsefulContents path
let namesfull = map (path </>) names
return namesfull
testBF fn = do -- crashes for /home/frank, does not go to swap
fps <- getFilePathBreadtFirst fn
putStrLn $ unlines fps
pipes
library to stream file names concurrent with traversing the tree.StateT (Seq FilePath)
transformer to achieve a breadth-first traversal.MaybeT
transformer to avoid manual recursion when writing the loop and exit.import Control.Monad
import Control.Monad.Trans
import Control.Monad.Trans.Maybe
import Control.Monad.State.Lazy
import Control.Pipe
import Data.Sequence
import System.FilePath.Posix
import System.Directory
loop :: (Monad m) => MaybeT m a -> m ()
loop = liftM (maybe () id) . runMaybeT . forever
quit :: (Monad m) => MaybeT m a
quit = mzero
getUsefulContents :: FilePath -> IO [FilePath]
getUsefulContents path
= fmap (filter (`notElem` [".", ".."])) $ getDirectoryContents path
permissible :: FilePath -> IO Bool
permissible file
= fmap (\p -> readable p && searchable p) $ getPermissions file
traverseTree :: FilePath -> Producer FilePath IO ()
traverseTree path = (`evalStateT` empty) $ loop $ do
-- All code past this point uses the following monad transformer stack:
-- MaybeT (StateT (Seq FilePath) (Producer FilePath IO)) ()
let liftState = lift
liftPipe = lift . lift
liftIO = lift . lift . lift
liftState $ modify (|> path)
forever $ do
x <- liftState $ gets viewl
case x of
EmptyL -> quit
file :< s -> do
liftState $ put s
liftPipe $ yield file
p <- liftIO $ doesDirectoryExist file
when p $ do
names <- liftIO $ getUsefulContents file
-- allowedNames <- filterM permissible names
let namesfull = map (path </>) names
liftState $ forM_ namesfull $ \name -> modify (|> name)
printer :: (Show a) => Consumer a IO r
printer = forever $ do
a <- await
lift $ print a
>>> runPipe $ printer <+< traverseTree path
<Prints file names as it traverses the tree>
-- Demand only 'n' elements
take' :: (Monad m) => Int -> Pipe a a m ()
take' n = replicateM_ n $ do
a <- await
yield a
>> runPipe $ printer <+< take' 3 <+< traverseTree path
<Prints only three files>
pipes
library trick, consult the pipes tutorial at Control.Pipes.Tutorial
.