如何在Haskell的文件系统中实现搜索? [英] How to implement search in file system in haskell?

查看:74
本文介绍了如何在Haskell的文件系统中实现搜索?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

对于haskell来说我并不是一个陌生的人,但是在现实世界中并没有太多使用它。

I'm not exactly new to haskell, but haven't used it much in real world.

所以我要做的是找到所有的git存储库从某些文件夹开始。基本上,我正在尝试执行 find。 -d d -exec test -e'{} /。git'';'-print -prune 使用haskell并发功能只能更快。

So what I want to do is to find all git repositories starting from some folders. Basically I'm trying to do this find . -type d -exec test -e '{}/.git' ';' -print -prune only faster via using haskell concurrency features.

这是我到目前为止所得到的。

This is what I got so far.

import Control.Concurrent.Async
import System.Directory (doesDirectoryExist)
import System.FilePath ((</>))
import System.IO (FilePath)


isGitRepo :: FilePath -> IO Bool
isGitRepo p = doesDirectoryExist $ p </> ".git"


main :: IO ()
main = putStrLn "hello"

我找到了这个 lib 具有此功能 mapConcurrently ::可遍历t => (a-> IO b)-> t a-> IO(t b)
这让我想到,我需要生成反映文件夹结构的惰性Tree数据结构。然后使用 isGitRepo 同时对其进行过滤,然后将其折叠成列表并打印出来。
好​​吧,我当然知道如何制作 data FTree =节点字符串[FTree] 或类似的东西,但是我有疑问。
如何同时生产?遍历树时如何产生绝对路径?诸如此类的问题,等等。

I've found this lib which has this function mapConcurrently :: Traversable t => (a -> IO b) -> t a -> IO (t b) Which got me thinking that what I need is to produce lazy Tree data structure that would reflect folders structure. Then filter it concurrently with isGitRepo and that fold it into list and print it. Well, of course I know how to make data FTree = Node String [FTree] or something like that, but I have questions. How to produce it concurrently? How to produce absolute path while traversing the tree? Questions like that and so on.

推荐答案


这让我认为我需要生产可以反映文件夹结构的惰性树数据结构。

Which got me thinking that what I need is to produce lazy Tree data structure that would reflect folders structure.

我不确定您是否需要树结构。您可以可以建立这样的中间结构,但是没有一个结构也可以进行管理。关键是您需要附加 O(1)(以合并结果)。差异列表(例如 dlist

I'm not sure you need a tree structure for this. You could make an intermediate such structure, but you could just as well manage without one. The key thing is you need to have O(1) appending (to combine your results). A difference list (like dlist) does this.


如何同时生产?

How to produce it concurrently?

您已经知道了:使用 mapConcurrently


如何生产遍历树时的绝对路径?

How to produce absolute path while traversing the tree?

listDirectory 可让您获取路径中的下一个可能分段。您可以通过将每个段附加到现有路径中来获得下一个路径(除非是现有路径,否则它们不是绝对路径)。

listDirectory lets you get the next possible segments in the path. You can get the next paths by appending each of these segments to the existing path (they won't be absolute paths unless the existing path was though).

这是一个工作函数:

import System.Directory (doesDirectoryExist, listDirectory)
import System.FilePath ((</>), combine)
import System.IO (FilePath)
import Control.Concurrent.Async (mapConcurrently)
import qualified Data.DList as DL

-- | tries to find all git repos in the subtree rooted at the path
findGitRepos :: FilePath -> IO (DL.DList FilePath)
findGitRepos p = do
  isNotDir <- not <$> doesDirectoryExist p
  if isNotDir
    then pure DL.empty             -- the path 'p' isn't a directory
    else do
      isGitDir <- doesDirectoryExist (p </> ".git")
      if isGitDir
        then pure (DL.singleton p) -- the folder is a git repo
        else do                    -- recurse to subfolders
          subdirs <- listDirectory p
          repos <- mapConcurrently findGitRepos (combine p `map` subdirs)
          pure (DL.concat repos)

这篇关于如何在Haskell的文件系统中实现搜索?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆