过滤仅包含文件的路径列表 [英] Filter a list of paths to only include files

查看:139
本文介绍了过滤仅包含文件的路径列表的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如果我有一个 FilePaths 的列表,如何过滤它们以仅返回那些常规文件(即不是符号链接或目录)?

例如,使用 getDirectoryContents

  main = do 
contents< - getDirectoryContents/ foo / bar
let onlyFiles = filterFunction content in
print onlyFiles

其中filterFunction是一个函数,它只返回表示文件的 FilePaths 。 / p>

答案可能只适用于Linux,但首选跨平台支持。

只使用doesDirectoryExist不能按预期工作。此脚本打印目录中所有内容的列表,而不仅仅是文件:

 模块Main其中

import System.Directory
import Control.Monad(filterM,liftM)

getFiles :: FilePath - > IO [FilePath]
getFiles root = do
contents< - getDirectoryContents root
filesHere< - filterM(liftM not。doesDirectoryExist)contents
subdirs< - filterM doesDirectoryExist contents
return filesHere

main = do
files< - getFiles/
print $ files

此外,变量subdirs将只包含.. / code>。

解决方案

要查找标准库函数, Hoogle 是一个很好的资源;它是一个Haskell搜索引擎,可让您按类型搜索。然而,使用它需要弄清楚如何思考Haskell Way™的类型,然而,你提出的类型签名并不适合。所以:


  1. 您正在寻找 [Filepath] - > [文件路径] 。请记住,Haskell拼写是 FilePath 。所以...

  2. 您正在寻找 [FilePath] - > [文件路径] 。这是不必要的;如果你想过滤东西,你应该使用 过滤 。所以... 您正在寻找一个类型为 FilePath - > Bool ,你可以传递给 filter 。但是这不可能是正确的:这个函数需要查询文件系统,这是一个效果,而Haskell使用 IO 来跟踪类型系统中的效果。所以... 您正在寻找一个类型为 FilePath - > IO Bool


如果我们在Hoogle上搜索,第一个结果是 doesFileExist :: FilePath - > IO Bool from System.Directory 。从文档:


操作 doesFileExist 返回 True 如果参数文件存在并且不是目录,并且 False 否则。


因此 System.Directory.doesFileExist 正是你想要的。 (呃...只有一点额外的工作!见下文。)



现在,你如何使用它?你不能在这里使用 filter ,因为你有一个有效的功能。您可以再次使用Hoogle - 如果 filter 的类型为(a - > Bool) - > [a] - > [a] ,然后使用monad m 注释函数的结果,为您提供新类型 Monad m => (a - > m Bool) - > [a] - > m [Bool] - 但有一个更简单的便宜技巧。一般来说,如果 func 是一个有效/一元版本的函数,那么这个有效/单元版本称为 funcM ,它通常位于 控制。 Monad 。实际上,有一个函数 Control.Monad.filterM :: Monad m => (a - > m Bool) - > [a] - >然而!尽管我们不愿意承认它,但即使在Haskell,类型不提供所有你需要的信息。重要的是,我们在这里会遇到一个问题:



因此,我们可以采取两种方法来解决问题。首先是调整 getDirectoryContents 的结果,以便正确解释它们。 (我们也放弃了 .. 结果,但如果你只是寻找普通文件,不会伤害任何东西)。这将返回包含正在检查内容的目录的文件名。调整 getDirectoryContents 函数如下所示:

  getQualifiedDirectoryContents :: FilePath  - > IO [FilePath] 
getQualifiedDirectoryContents fp =
map(fp< />)。过滤器(`notElem` [。,..])< $> getDirectoryContents fp

过滤器摆脱了特殊目录和 map 将参数目录预加载到所有结果。这使得返回的文件可接受的参数为 doesFileExist 。 (如果您之前没有看到它们, (System.FilePath。< />) 追加两个文件路径; (Control.Applicative。< $>) ,也可以 (Data.Functor。< $>) ,是 fmap ,这就像 liftM ,但更广泛适用。)

把所有这些放在一起,你的最终代码变成:

  i mport Control.Applicative 
import Control.Monad
import System.FilePath
import System.Directory

getQualifiedDirectoryContents :: FilePath - > IO [FilePath]
getQualifiedDirectoryContents fp =
map(fp< />)。过滤器(`notElem` [。,..])< $> getDirectoryContents fp
$ b $ main main :: IO()
main = do
contents< - getQualifiedDirectoryContents/ foo / bar
onlyFiles< - filterM doesFileExist contents
print onlyFiles

或者,如果您觉得自己是花哨的/无点的话:

  import Control.Applicative 
import Control.Monad
import System.FilePath
import System.Directory

getQualifiedDirectoryContents :: FilePath - > IO [FilePath]
getQualifiedDirectoryContents fp =
map(fp< />)。过滤器(`notElem` [。,..])< $> getDirectoryContents fp

main :: IO()
main = print
=<< filterM doesFileExist
=<< getQualifiedDirectoryContents/ foo / bar

第二种方法是调整事物,使 doesFileExist 使用适当的当前目录运行。这将仅返回与正在检查内容的目录相关的文件名。为此,我们希望使用 withCurrentDirectory :: FilePath - > IO a - > IO a 函数(但见下文),然后传递 getDirectoryContents 当前目录参数。 withCurrentDirectory 的文档(部分)说:
$ b


运行一个 IO 操作,然后恢复原来的工作目录,即使给定的操作由于异常而失败。



$ b $将所有这些放在一起给了我们以下代码:

$ p $ import $ Control $元b $ import System.Directory

main :: IO()
main = withCurrentDirectory/ foo / bar$
print =<< filterM doesFileExist =<< getDirectoryContents。

这是我们想要的,但不幸的是,它仅适用于<$ $的1.3.2.0版本c $ c>目录包 - 在写这篇文章的时候,是最新的,而不是我拥有的。幸运的是,这是一个容易实现的功能;这种设置值的本地函数通常以 Control.Exception.bracket :: IO a - > (a→IO b)→> (a→IOc)→> IO c 括号函数在执行之前以括号运行,并正确处理异常。因此,我们可以自己定义 withCurrentDirectory

  withCurrentDirectory :: FilePath  - > ; IO a  - > IO a 
withCurrentDirectory fp m =
括号getCurrentDirectory setCurrentDirectory $ \_ - >做
setCurrentDirectory fp
m

然后用这个来得到最终的代码:

  import Control.Exception 
import Control.Monad
import System.Directory

withCurrentDirectory :: FilePath - > IO a - > IO a
withCurrentDirectory fp m =
括号getCurrentDirectory setCurrentDirectory $ \_ - > do
setCurrentDirectory fp
m

main :: IO()
main = withCurrentDirectory/ foo / bar$
print =<< filterM doesFileExist =<< getDirectoryContents。






另外,关于<$ c $在 do 块中, 中的

  do ... foo ... 
let x = ... bar ...
... baz ...

相当于

  do ... foo ... 
let x = ... bar ... in
do ... baz ...
code>

因此,您的示例代码在<$ code>中不需要 c $ c> let 并且可以超出 print 调用。






¹并非总是:有时您需要不同类别的效果!使用 Applicative from Control.Applicative ;更多的东西是 Applicative s比 Monad s(尽管这意味着你可以少用它们)。在这种情况下,有效的功能可能就在那里,或者也可以在 Data.Foldable Data.Traversable


If I have a list of FilePaths, how can I filter them to return only the ones that are regular files (namely, not symlinks or directories)?

For example, using getDirectoryContents

main = do
    contents <- getDirectoryContents "/foo/bar"
    let onlyFiles = filterFunction contents in
        print onlyFiles

where "filterFunction" is a function that returns only the FilePaths that represent files.

The answer may just work on Linux, but cross platform support is preferred.

[EDIT] Just using doesDirectoryExist doesn't work as expected. This script prints a list of everything in the directory, not just files:

module Main where

import System.Directory
import Control.Monad (filterM, liftM)

getFiles :: FilePath -> IO [FilePath]
getFiles root = do
    contents <- getDirectoryContents root
    filesHere <- filterM (liftM not . doesDirectoryExist) contents
    subdirs <- filterM doesDirectoryExist contents
    return filesHere

main = do
    files <- getFiles "/"
    print $ files

Additionally, the variable subdirs will only contain "." and "..".

解决方案

To find standard library functions, Hoogle is a great resource; it's a Haskell search engine that lets you search by type. Using it requires figuring out how to think about types the Haskell Way™, though, which your proposed type signatures doesn't quite work with. So:

  1. You're looking for [Filepath] -> [Filepath]. Remember, the Haskell spelling is FilePath. So…

  2. You're looking for [FilePath] -> [FilePath]. This is unnecessary; if you want to filter things, you should use filter. So…

  3. You're looking for a function of type FilePath -> Bool that you can pass to filter. But this can't quite be right: this function needs to query the file system, which is an effect, and Haskell tracks effects in the type system using IO. So…

  4. You're looking for a function of type FilePath -> IO Bool.

And if we search for that on Hoogle, the first result is doesFileExist :: FilePath -> IO Bool from System.Directory. From the docs:

The operation doesFileExist returns True if the argument file exists and is not a directory, and False otherwise.

So System.Directory.doesFileExist is exactly what you want. (Well… only with a little extra work! See below.)

Now, how do you use it? You can't use filter here, because you have an effectful function. You could use Hoogle again – if filter has the type (a -> Bool) -> [a] -> [a], then annotating the results of the functions with a monad m gives you the new type Monad m => (a -> m Bool) -> [a] -> m [Bool] – but there's an easier "cheap trick". In general, if func is a function with an effectful/monadic version, that effectful/monadic version is called funcM, and it often lives in Control.Monad.¹ And indeed, there is a function Control.Monad.filterM :: Monad m => (a -> m Bool) -> [a] -> m [a].

However! Much as we hate to admit it, even in Haskell, types don't provide all the information you need. Importantly, we're going to have a problem here:

  • File paths given as arguments to functions are interpreted relative to the current directory, but…
  • getDirectoryContents returns paths relative to its argument.

Thus, there are two approaches we can take to fix things. The first is to adjust the results of getDirectoryContents so that they can be interpreted correctly. (We also discarding the . and .. results, although if you're just looking for regular files they won't hurt anything.) This will return file names which include the directory whose contents are being examined. The adjust getDirectoryContents function looks like this:

getQualifiedDirectoryContents :: FilePath -> IO [FilePath]
getQualifiedDirectoryContents fp =
    map (fp </>) . filter (`notElem` [".",".."]) <$> getDirectoryContents fp

The filter gets rid of the special directories, and the map prepends the argument directory to all the results. This makes the returned files acceptable arguments to doesFileExist. (If you haven't seen them before, (System.FilePath.</>) appends two file paths; and (Control.Applicative.<$>), also available as (Data.Functor.<$>), is an infix synonym for fmap, which is like liftM but more broadly applicable.)

Putting that all together, your final code becomes:

import Control.Applicative
import Control.Monad
import System.FilePath
import System.Directory

getQualifiedDirectoryContents :: FilePath -> IO [FilePath]
getQualifiedDirectoryContents fp =
    map (fp </>) . filter (`notElem` [".",".."]) <$> getDirectoryContents fp

main :: IO ()
main = do
  contents  <- getQualifiedDirectoryContents "/foo/bar"
  onlyFiles <- filterM doesFileExist contents
  print onlyFiles

Or, if you feel like being fancy/point-free:

import Control.Applicative
import Control.Monad
import System.FilePath
import System.Directory

getQualifiedDirectoryContents :: FilePath -> IO [FilePath]
getQualifiedDirectoryContents fp =
    map (fp </>) . filter (`notElem` [".",".."]) <$> getDirectoryContents fp

main :: IO ()
main =   print
     =<< filterM doesFileExist
     =<< getQualifiedDirectoryContents "/foo/bar"

The second approach is to adjust things so that doesFileExist runs with the appropriate current directory. This will return just the file name relative to the directory whose contents are being examined. To do this, we want to use the withCurrentDirectory :: FilePath -> IO a -> IO a function (but see below), and then pass getDirectoryContents the current directory "." argument. The documentation for withCurrentDirectory says (in part):

Run an IO action with the given working directory and restore the original working directory afterwards, even if the given action fails due to an exception.

Putting all this together gives us the following code

import Control.Monad
import System.Directory

main :: IO ()
main = withCurrentDirectory "/foo/bar" $
         print =<< filterM doesFileExist =<< getDirectoryContents "."

This is what we want, but unfortunately, it's only available in version 1.3.2.0 of the directory package – as of this writing, the most recent one, and not the one I have. Luckily, it's an easy function to implement; such set-a-value-locally functions are usually implemented in terms of Control.Exception.bracket :: IO a -> (a -> IO b) -> (a -> IO c) -> IO c. The bracket function is run as bracket before after action, and it correctly handles exceptions. So we can define withCurrentDirectory ourselves:

withCurrentDirectory :: FilePath -> IO a -> IO a
withCurrentDirectory fp m =
  bracket getCurrentDirectory setCurrentDirectory $ \_ -> do
    setCurrentDirectory fp
    m

And then use this to get the final code:

import Control.Exception
import Control.Monad
import System.Directory

withCurrentDirectory :: FilePath -> IO a -> IO a
withCurrentDirectory fp m =
  bracket getCurrentDirectory setCurrentDirectory $ \_ -> do
    setCurrentDirectory fp
    m

main :: IO ()
main = withCurrentDirectory "/foo/bar" $
         print =<< filterM doesFileExist =<< getDirectoryContents "."


Also, one quick note about lets in dos: in a do block,

do ...foo...
   let x = ...bar...
   ...baz...

is equivalent to

do ...foo...
   let x = ...bar... in
     do ...baz...

So your example code doesn't need the in in the let and can outdent the print call.


¹ Not always: sometimes you want different classes of effects! Use Applicative from Control.Applicative when possible; more things are Applicatives than are Monads (although this means you can do less with them). In that case, the effectful functions may live there, or also in Data.Foldable or Data.Traversable.

这篇关于过滤仅包含文件的路径列表的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆