在Haskell中流式递归下降一个目录 [英] Streaming recursive descent of a directory in Haskell
问题描述
我试图用Haskell做一个目录结构的递归下降。我只想根据需要检索子目录和文件(懒惰地)。
我编写了下面的代码,但是当我运行它时,跟踪显示所有目录访问第一个文件之前:
module Main其中
import Control.Monad(forM,forM_ ,liftM)
import Debug.Trace(trace)
import System.Directory(doesDirectoryExist,getDirectoryContents)
import System.Environment(getArgs)
import System.FilePath((< />))
- 来自真实世界Haskell,p。 214
getRecursiveContents :: FilePath - > IO [FilePath]
getRecursiveContents topPath = do
名称< - getDirectoryContents topPath
let
properNames =
filter(`notElem` [。,.. ])$
trace(Processing++ topPath)命名
paths< - forM properNames $ \\\
ame - >> do
let path = topPath< />名称
isDirectory< - doesDirectoryExist路径
if isDirectory
then getRecursiveContents path
else return [path]
return(concat paths)
main :: IO()
main = do
[path]< - getArgs
files< - getRecursiveContents path
forM_ files $ \file - > putStrLn $找到文件++文件
如何将文件处理与下降交错?在 forM _
之前执行文件< -getRecursiveContents路径
> main ?
这就是迭代/协程被设计来解决的问题。
您可以使用 pipes
轻松完成此操作。我对你的 getRecursiveContents
做的唯一改动是使它成为 FilePath 的生产者
code> s并用<文件名> < getRecursiveContents
完成。
module Main其中
导入Control.Monad(forM_,liftM)
导入Control.Proxy
导入System.Directory(doesDirectoryExist,getDirectoryContents)
导入系统。 Environment(getArgs)
import System.FilePath((< />))
getRecursiveContents ::(Proxy p)=> FilePath - > () - > Producer p FilePath IO()
getRecursiveContents topPath()= runIdentityP $ do
名称< - lift $ getDirectoryContents topPath
let properNames = filter(`notElem` [。,.. .. ])名称
forM_ properNames $ \\\
ame - > do
let path = topPath< />名称
isDirectory< - lift $ doesDirectoryExist路径
if isDirectory
然后getRecursiveContents path()
else响应路径
main :: IO()
main = do
[path]< - getArgs
runProxy $
getRecursiveContents path
> - > useD(\file-> putStrLn $Found file++ file)
每个文件在遍历树时立即生效,并且不需要懒惰 IO
。改变你对文件名的操作也很容易,因为你所要做的就是用你的实际文件处理逻辑来切换 useD
阶段。
要详细了解 pipes
,我强烈建议您阅读 Control.Proxy.Tutorial 。
I am trying to do a recursive descent of a directory structure using Haskell. I would like to only retrieve the child directories and files as needed (lazily).
I wrote the following code, but when I run it, the trace shows that all directories are visited before the first file:
module Main where
import Control.Monad ( forM, forM_, liftM )
import Debug.Trace ( trace )
import System.Directory ( doesDirectoryExist, getDirectoryContents )
import System.Environment ( getArgs )
import System.FilePath ( (</>) )
-- From Real World Haskell, p. 214
getRecursiveContents :: FilePath -> IO [FilePath]
getRecursiveContents topPath = do
names <- getDirectoryContents topPath
let
properNames =
filter (`notElem` [".", ".."]) $
trace ("Processing " ++ topPath) names
paths <- forM properNames $ \name -> do
let path = topPath </> name
isDirectory <- doesDirectoryExist path
if isDirectory
then getRecursiveContents path
else return [path]
return (concat paths)
main :: IO ()
main = do
[path] <- getArgs
files <- getRecursiveContents path
forM_ files $ \file -> putStrLn $ "Found file " ++ file
How can I interleave the file processing with the descent? Is the problem that the files <- getRecursiveContents path
action gets performed before the following forM_
in main
?
This is exactly the kind of problem that iteratees/coroutines were designed to solve.
You can easily do this with pipes
. The only change I made to your getRecursiveContents
was to make it a Producer
of FilePath
s and to respond
with the file name instead of returning it. This lets downstream handle the file name immediately instead of waiting for getRecursiveContents
complete.
module Main where
import Control.Monad ( forM_, liftM )
import Control.Proxy
import System.Directory ( doesDirectoryExist, getDirectoryContents )
import System.Environment ( getArgs )
import System.FilePath ( (</>) )
getRecursiveContents :: (Proxy p) => FilePath -> () -> Producer p FilePath IO ()
getRecursiveContents topPath () = runIdentityP $ do
names <- lift $ getDirectoryContents topPath
let properNames = filter (`notElem` [".", ".."]) names
forM_ properNames $ \name -> do
let path = topPath </> name
isDirectory <- lift $ doesDirectoryExist path
if isDirectory
then getRecursiveContents path ()
else respond path
main :: IO ()
main = do
[path] <- getArgs
runProxy $
getRecursiveContents path
>-> useD (\file -> putStrLn $ "Found file " ++ file)
This prints out each file immediately as it traverses the tree, and it does not require lazy IO
. It's also very easy to change what you do with the file names, since all you have to do is switch out the useD
stage with your actual file handling logic.
To learn more about pipes
, I highly recommend you read Control.Proxy.Tutorial.
这篇关于在Haskell中流式递归下降一个目录的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!