Python,如何实现类似.gitignore的行为 [英] Python, how to implement something like .gitignore behavior

查看:208
本文介绍了Python,如何实现类似.gitignore的行为的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要列出当前目录(.)(包括所有子目录)中的所有文件,并排除.gitignore的工作方式中的某些文件(

I need to list all files in the current directory (.) (including all sub directories), and exclude some files as how .gitignore works (http://git-scm.com/docs/gitignore)

使用fnmatch( https://docs.python.org/2/library/fnmatch .html ),我将能够使用模式过滤"文件

With fnmatch (https://docs.python.org/2/library/fnmatch.html) I will be able to "filter" files using a pattern

ignore_files = ['*.jpg', 'foo/', 'bar/hello*']
matches = []
for root, dirnames, filenames in os.walk('.'):
  for filename in fnmatch.filter(filenames, '*'):
      matches.append(os.path.join(root, filename))

如何过滤"并获取与"ignore_files"中一个或多个元素不匹配的所有文件?

how can I "filter" and get all files which doesn't match with one or more element of my "ignore_files"?

谢谢!

推荐答案

您在正确的轨道上:如果要使用fnmatch样式的模式,则应使用

You're on the right track: If you want to use fnmatch-style patterns, you should use fnmatch.filter with them.

但是有三个问题使这个问题变得不那么容易了.

But there are three problems that make this not quite trivial.

首先,您要应用多个过滤器.你是怎样做的?多次拨打filter:

First, you want to apply multiple filters. How do you do that? Call filter multiple times:

for ignore in ignore_files:
    filenames = fnmatch.filter(filenames, ignore)

第二,您实际上想对filter reverse :返回与不匹配的名称的子集.如文档所述:

Second, you actually want to do the reverse of filter: return the subset of names that don't match. As the documentation explains:

[n for n in names if fnmatch(n, pattern)]相同,但实施效率更高.

It is the same as [n for n in names if fnmatch(n, pattern)], but implemented more efficiently.

因此,相反地,您只需放入not:

So, to do the opposite, you just throw in a not:

for ignore in ignore_files:
    filenames = [n for n in filenames if not fnmatch(n, ignore)]

最后,您尝试过滤部分路径名,而不仅仅是文件名,但是直到过滤后才进行join过滤.因此,切换顺序:

Finally, you're attempting to filter on partial pathnames, not just filenames, but you're not doing the join until after the filtering. So switch the order:

filenames = [os.path.join(root, filename) for filename in filenames]
for ignore in ignore_files:
    filenames = [n for n in filenames if not fnmatch(n, ignore)]
matches.extend(filenames)


有几种方法可以改善这一点.


There are few ways you could improve this.

您可能希望使用生成器表达式而不是列表理解(用括号而不是方括号),因此,如果文件名列表很大,则使用的是惰性管道,而不是浪费时间和空间来重复构建庞大的列表.

You may want to use a generator expression instead of a list comprehension (parentheses instead of square brackets), so if you have huge lists of filenames you're using a lazy pipeline instead of wasting time and space repeatedly building huge lists.

此外,如果反转循环的顺序,可能会更容易理解,也许不太容易理解,

Also, it may or may not be easier to understand if you invert the order of the loops, like this:

filenames = (n for n in filenames 
             if not any(fnmatch(n, ignore) for ignore in ignore_files))

最后,如果您担心性能,可以在每个表达式上使用fnmatch.translate将它们转换为等效的正则表达式,然后将它们合并为一个大的正则表达式并进行编译,并使用它代替在.如果允许您的模式不仅仅是*.jpg复杂的话,这可能会很棘手,除非您确实在这里确定了性能瓶颈,否则我不建议您这样做.但是,如果您需要这样做,我在SO上至少看到了一个问题,有人花了很多精力来敲定所有极端情况,因此请搜索而不是自己编写.

Finally, if you're worried about performance, you can use fnmatch.translate on each expression to turn them into equivalent regexps, then merge them into one big regexp and compile it, and use that instead of a loop around fnmatch. This can get tricky if your patterns are allowed to be more complicated than just *.jpg, and I wouldn't recommend it unless you really do identify a performance bottleneck here. But if you need to do it, I've seen at least one question on SO where someone put a lot of effort into hammering out all the edge cases, so search instead of trying to write it yourself.

这篇关于Python,如何实现类似.gitignore的行为的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆