壳牌:在列表中查找文件 [英] Shell: find files in a list
问题描述
我有一个包含约1000个文件名列表。我想一个根目录下,这些文件的路径。有很多子目录中的目录,并在子目录超过1,000,000的文件。下面的命令将确实发现1000次:
I have a list containing about 1000 file names. I want to path of these files under a base directory. There're many subdirs in the dir, and more than 1,000,000 files in the subdirs. The following command will do find for 1000 times:
cat filelist.txt | while read f; do find /dir -name $f; done
是否有更快,更好的方式来做到这一点?
Is there faster or better way to do it?
推荐答案
如果 Filelist.txt中
每行有一个单一的文件名:
If filelist.txt
has a single filename per line:
find /dir | grep -f <(sed 's@^@/@; s/$/$/; s/\([\.[\*]\|\]\)/\\\1/g' filelist.txt)
(即 -f
选项意味着grep的搜索给定文件中的所有模式。)
(The -f
option means that grep searches for all the patterns in the given file.)
&LT的的说明;(SED的@ ^ @ / @; S / $ / $ /; S / \\([。\\ [\\ *] \\ | \\] \\) / \\\\\\ 1 / G'Filelist.txt中)
:
Explanation of <(sed 's@^@/@; s/$/$/; s/\([\.[\*]\|\]\)/\\\1/g' filelist.txt)
:
的≤(...)
被称为的过程subsitution 和有点类似于 $(...)
。这种情况等同于(但使用过程中替换的整洁,并有可能更快一点):
The <( ... )
is called a process subsitution, and is a little similar to $( ... )
. The situation is equivalent to (but using the process substitution is neater and possibly a little faster):
sed 's@^@/@; s/$/$/; s/\([\.[\*]\|\]\)/\\\1/g' filelist.txt > processed_filelist.txt
find /dir | grep -f processed_filelist.txt
到 SED
调用运行命令取值@ ^ @ / @
,取值/ $ / $ /
和 S / \\([\\ [\\ *] \\ | \\] \\)/ \\\\\\ 1 / G
在 Filelist.txt中
并打印出来的每一行。这些命令转换成文件名,将使用grep更好地工作的格式。
The call to sed
runs the commands s@^@/@
, s/$/$/
and s/\([\.[\*]\|\]\)/\\\1/g
on each line of filelist.txt
and prints them out. These commands convert the filenames into a format that will work better with grep.
-
取值@ ^ @ / @
表示每个文件名前加上一个/
在。 (在^
的正则表达式的意思是行首) -
S / $ / $ /
表示把$
在每个文件名的末尾。 (第一个$
表示行结束,二是只是一个文字$
然后将其除$ P通过grep的$ PTED的意思是行结束)。
s@^@/@
means put a/
at the before each filename. (The^
means "start of line" in a regex)s/$/$/
means put a$
at the end of each filename. (The first$
means "end of line", the second is just a literal$
which is then interpreted by grep to mean "end of line").
这两个规则的结合意味着grep的只会看像 ... /&LT匹配;文件名&GT;
,使 A。 TXT
不匹配 ./ a.txt.backup
或 ./ abba.txt
。
The combination of these two rules means that grep will only look for matches like .../<filename>
, so that a.txt
doesn't match ./a.txt.backup
or ./abba.txt
.
S / \\([\\ [\\ *] \\ | \\] \\)/ \\\\\\ 1 / G
将一个 \\
。
[
] $ C $每次发生之前C>或
*
。 grep所使用的正则表达式和这些字符被认为是特殊的,但我们希望他们是普通的,所以我们需要躲避他们(如果我们没有逃避他们,那么文件名如 A.TXT
将匹配像文件 abtxt
)。
s/\([\.[\*]\|\]\)/\\\1/g
puts a \
before each occurrence of .
[
]
or *
. Grep uses regexes and those characters are considered special, but we want them to be plain so we need to escape them (if we didn't escape them, then a file name like a.txt
would match files like abtxt
).
作为一个例子:
$ cat filelist.txt
file1.txt
file2.txt
blah[2012].txt
blah[2011].txt
lastfile
$ sed 's@^@/@; s/$/$/; s/\([\.[\*]\|\]\)/\\\1/g' filelist.txt
/file1\.txt$
/file2\.txt$
/blah\[2012\]\.txt$
/blah\[2011\]\.txt$
/lastfile$
grep的然后使用该输出的每一行,当它正在搜索的输出发现的模式
。
这篇关于壳牌:在列表中查找文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!