Shell:在目录下的列表中查找文件 [英] Shell: find files in a list under a directory
问题描述
我有一个包含约1000个文件名的列表,可在目录及其子目录下进行搜索.有数百个具有超过1,000,000个文件的子目录.以下命令将运行find 1000次:
cat filelist.txt | while read f; do find /dir -name $f; done
有更快的方法吗?
如果filelist.txt
每行只有一个文件名:
find /dir | grep -f <(sed 's@^@/@; s/$/$/; s/\([\.[\*]\|\]\)/\\\1/g' filelist.txt)
(-f
选项意味着grep搜索给定文件中的所有模式.)
<(sed 's@^@/@; s/$/$/; s/\([\.[\*]\|\]\)/\\\1/g' filelist.txt)
的解释:
<( ... )
被称为流程替代,与$( ... )
有点类似.这种情况是等效的(但是使用流程替换更整洁,可能更快一些):
sed 's@^@/@; s/$/$/; s/\([\.[\*]\|\]\)/\\\1/g' filelist.txt > processed_filelist.txt
find /dir | grep -f processed_filelist.txt
对sed
的调用在filelist.txt
的每一行上运行命令s@^@/@
,s/$/$/
和s/\([\.[\*]\|\]\)/\\\1/g
并打印出来.这些命令将文件名转换为与grep搭配使用时会更好的格式.
-
s@^@/@
表示将/
放在每个文件名之前. (^
在正则表达式中表示行首") -
s/$/$/
表示在每个文件名的末尾放置一个$
. (第一个$
表示行尾",第二个只是文字$
,然后由grep解释为行尾").
这两个规则的组合意味着grep将仅查找.../<filename>
之类的匹配项,因此a.txt
与./a.txt.backup
或./abba.txt
不匹配.
s/\([\.[\*]\|\]\)/\\\1/g
在每次出现.
[
]
或*
之前放置\
. Grep使用正则表达式,这些字符被认为是特殊字符,但我们希望它们是普通字符,因此我们需要对其进行转义(如果不对它们进行转义,则文件名a.txt
将与文件名abtxt
匹配). /p>
例如:
$ cat filelist.txt
file1.txt
file2.txt
blah[2012].txt
blah[2011].txt
lastfile
$ sed 's@^@/@; s/$/$/; s/\([\.[\*]\|\]\)/\\\1/g' filelist.txt
/file1\.txt$
/file2\.txt$
/blah\[2012\]\.txt$
/blah\[2011\]\.txt$
/lastfile$
Grep然后在搜索find
的输出时,将该输出的每一行用作模式.
I have a list containing about 1000 file names to search under a directory and its subdirectories. There are hundreds of subdirs with more than 1,000,000 files. The following command will run find for 1000 times:
cat filelist.txt | while read f; do find /dir -name $f; done
Is there a much faster way to do it?
If filelist.txt
has a single filename per line:
find /dir | grep -f <(sed 's@^@/@; s/$/$/; s/\([\.[\*]\|\]\)/\\\1/g' filelist.txt)
(The -f
option means that grep searches for all the patterns in the given file.)
Explanation of <(sed 's@^@/@; s/$/$/; s/\([\.[\*]\|\]\)/\\\1/g' filelist.txt)
:
The <( ... )
is called a process subsitution, and is a little similar to $( ... )
. The situation is equivalent to (but using the process substitution is neater and possibly a little faster):
sed 's@^@/@; s/$/$/; s/\([\.[\*]\|\]\)/\\\1/g' filelist.txt > processed_filelist.txt
find /dir | grep -f processed_filelist.txt
The call to sed
runs the commands s@^@/@
, s/$/$/
and s/\([\.[\*]\|\]\)/\\\1/g
on each line of filelist.txt
and prints them out. These commands convert the filenames into a format that will work better with grep.
s@^@/@
means put a/
at the before each filename. (The^
means "start of line" in a regex)s/$/$/
means put a$
at the end of each filename. (The first$
means "end of line", the second is just a literal$
which is then interpreted by grep to mean "end of line").
The combination of these two rules means that grep will only look for matches like .../<filename>
, so that a.txt
doesn't match ./a.txt.backup
or ./abba.txt
.
s/\([\.[\*]\|\]\)/\\\1/g
puts a \
before each occurrence of .
[
]
or *
. Grep uses regexes and those characters are considered special, but we want them to be plain so we need to escape them (if we didn't escape them, then a file name like a.txt
would match files like abtxt
).
As an example:
$ cat filelist.txt
file1.txt
file2.txt
blah[2012].txt
blah[2011].txt
lastfile
$ sed 's@^@/@; s/$/$/; s/\([\.[\*]\|\]\)/\\\1/g' filelist.txt
/file1\.txt$
/file2\.txt$
/blah\[2012\]\.txt$
/blah\[2011\]\.txt$
/lastfile$
Grep then uses each line of that output as a pattern when it is searching the output of find
.
这篇关于Shell:在目录下的列表中查找文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!