在Haskell中流式递归下降一个目录 [英] Streaming recursive descent of a directory in Haskell

查看:125
本文介绍了在Haskell中流式递归下降一个目录的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图用Haskell做一个目录结构的递归下降。我只想根据需要检索子目录和文件(懒惰地)。

我编写了下面的代码,但是当我运行它时,跟踪显示所有目录访问第一个文件之前:

  module Main其中

import Control.Monad(forM,forM_ ,liftM)
import Debug.Trace(trace)
import System.Directory(doesDirectoryExist,getDirectoryContents)
import System.Environment(getArgs)
import System.FilePath((< />))

- 来自真实世界Haskell,p。 214
getRecursiveContents :: FilePath - > IO [FilePath]
getRecursiveContents topPath = do
名称< - getDirectoryContents topPath
let
properNames =
filter(`notElem` [。,.. ])$
trace(Processing++ topPath)命名
paths< - forM properNames $ \\\
ame - >> do
let path = topPath< />名称
isDirectory< - doesDirectoryExist路径
if isDirectory
then getRecursiveContents path
else return [path]
return(concat paths)

main :: IO()
main = do
[path]< - getArgs
files< - getRecursiveContents path
forM_ files $ \file - > putStrLn $找到文件++文件

如何将文件处理与下降交错?在 forM _ 之前执行文件< -getRecursiveContents路径 > main ?

解决方案

这就是迭代/协程被设计来解决的问题。



您可以使用 pipes 轻松完成此操作。我对你的 getRecursiveContents 做的唯一改动是使它成为 FilePath 的生产者 code> s并用<文件名> <> 作为文件名,而不是返回它。这允许下游立即处理文件名,而不是等待 getRecursiveContents 完成。

  module Main其中

导入Control.Monad(forM_,liftM)
导入Control.Proxy
导入System.Directory(doesDirectoryExist,getDirectoryContents)
导入系统。 Environment(getArgs)
import System.FilePath((< />))

getRecursiveContents ::(Proxy p)=> FilePath - > () - > Producer p FilePath IO()
getRecursiveContents topPath()= runIdentityP $ do
名称< - lift $ getDirectoryContents topPath
let properNames = filter(`notElem` [。,.. .. ])名称
forM_ properNames $ \\\
ame - > do
let path = topPath< />名称
isDirectory< - lift $ doesDirectoryExist路径
if isDirectory
然后getRecursiveContents path()
else响应路径

main :: IO()
main = do
[path]< - getArgs
runProxy $
getRecursiveContents path
> - > useD(\file-> putStrLn $Found file++ file)

每个文件在遍历树时立即生效,并且不需要懒惰 IO 。改变你对文件名的操作也很容易,因为你所要做的就是用你的实际文件处理逻辑来切换 useD 阶段。



要详细了解 pipes ,我强烈建议您阅读 Control.Proxy.Tutorial


I am trying to do a recursive descent of a directory structure using Haskell. I would like to only retrieve the child directories and files as needed (lazily).

I wrote the following code, but when I run it, the trace shows that all directories are visited before the first file:

module Main where

import Control.Monad ( forM, forM_, liftM )
import Debug.Trace ( trace )
import System.Directory ( doesDirectoryExist, getDirectoryContents )
import System.Environment ( getArgs )
import System.FilePath ( (</>) )

-- From Real World Haskell, p. 214
getRecursiveContents :: FilePath -> IO [FilePath]
getRecursiveContents topPath = do
  names <- getDirectoryContents topPath
  let
    properNames =
      filter (`notElem` [".", ".."]) $
      trace ("Processing " ++ topPath) names
  paths <- forM properNames $ \name -> do
    let path = topPath </> name
    isDirectory <- doesDirectoryExist path
    if isDirectory
      then getRecursiveContents path
      else return [path]
  return (concat paths)

main :: IO ()
main = do
  [path] <- getArgs
  files <- getRecursiveContents path
  forM_ files $ \file -> putStrLn $ "Found file " ++ file

How can I interleave the file processing with the descent? Is the problem that the files <- getRecursiveContents path action gets performed before the following forM_ in main?

解决方案

This is exactly the kind of problem that iteratees/coroutines were designed to solve.

You can easily do this with pipes. The only change I made to your getRecursiveContents was to make it a Producer of FilePaths and to respond with the file name instead of returning it. This lets downstream handle the file name immediately instead of waiting for getRecursiveContents complete.

module Main where

import Control.Monad ( forM_, liftM )
import Control.Proxy
import System.Directory ( doesDirectoryExist, getDirectoryContents )
import System.Environment ( getArgs )
import System.FilePath ( (</>) )

getRecursiveContents :: (Proxy p) => FilePath -> () -> Producer p FilePath IO ()
getRecursiveContents topPath () = runIdentityP $ do
  names <- lift $ getDirectoryContents topPath
  let properNames = filter (`notElem` [".", ".."]) names
  forM_ properNames $ \name -> do
    let path = topPath </> name
    isDirectory <- lift $ doesDirectoryExist path
    if isDirectory
      then getRecursiveContents path ()
      else respond path

main :: IO ()
main = do
    [path] <- getArgs
    runProxy $
            getRecursiveContents path
        >-> useD (\file -> putStrLn $ "Found file " ++ file)

This prints out each file immediately as it traverses the tree, and it does not require lazy IO. It's also very easy to change what you do with the file names, since all you have to do is switch out the useD stage with your actual file handling logic.

To learn more about pipes, I highly recommend you read Control.Proxy.Tutorial.

这篇关于在Haskell中流式递归下降一个目录的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆