Shell:在目录下的列表中查找文件 [英] Shell: find files in a list under a directory

查看：559 发布时间：2020/5/1 8:22:31 linux bash shell

本文介绍了Shell:在目录下的列表中查找文件的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一个包含约1000个文件名的列表，可在目录及其子目录下进行搜索.有数百个具有超过1,000,000个文件的子目录.以下命令将运行find 1000次:

cat filelist.txt | while read f; do find /dir -name $f; done

有更快的方法吗?

解决方案

如果filelist.txt每行只有一个文件名:

find /dir | grep -f <(sed 's@^@/@; s/$/$/; s/\([\.[\*]\|\]\)/\\\1/g' filelist.txt)

(-f选项意味着grep搜索给定文件中的所有模式.)

<(sed 's@^@/@; s/$/$/; s/$[\.[\*]\|\]$/\\\1/g' filelist.txt)的解释:

<( ... )被称为流程替代，与$( ... )有点类似.这种情况是等效的(但是使用流程替换更整洁，可能更快一些):

sed 's@^@/@; s/$/$/; s/\([\.[\*]\|\]\)/\\\1/g' filelist.txt > processed_filelist.txt
find /dir | grep -f processed_filelist.txt

对sed的调用在filelist.txt的每一行上运行命令s@^@/@，s/$/$/和s/$[\.[\*]\|\]$/\\\1/g并打印出来.这些命令将文件名转换为与grep搭配使用时会更好的格式.

s@^@/@表示将/放在每个文件名之前. (^在正则表达式中表示行首")
s/$/$/表示在每个文件名的末尾放置一个$. (第一个$表示行尾"，第二个只是文字$，然后由grep解释为行尾").

这两个规则的组合意味着grep将仅查找.../<filename>之类的匹配项，因此a.txt与./a.txt.backup或./abba.txt不匹配.

s/$[\.[\*]\|\]$/\\\1/g在每次出现. [ ]或*之前放置\. Grep使用正则表达式，这些字符被认为是特殊字符，但我们希望它们是普通字符，因此我们需要对其进行转义(如果不对它们进行转义，则文件名a.txt将与文件名abtxt匹配). /p>

例如:

$ cat filelist.txt
file1.txt
file2.txt
blah[2012].txt
blah[2011].txt
lastfile

$ sed 's@^@/@; s/$/$/; s/\([\.[\*]\|\]\)/\\\1/g' filelist.txt
/file1\.txt$
/file2\.txt$
/blah\[2012\]\.txt$
/blah\[2011\]\.txt$
/lastfile$

Grep然后在搜索find的输出时，将该输出的每一行用作模式.

I have a list containing about 1000 file names to search under a directory and its subdirectories. There are hundreds of subdirs with more than 1,000,000 files. The following command will run find for 1000 times:

cat filelist.txt | while read f; do find /dir -name $f; done

Is there a much faster way to do it?

解决方案

If filelist.txt has a single filename per line:

find /dir | grep -f <(sed 's@^@/@; s/$/$/; s/\([\.[\*]\|\]\)/\\\1/g' filelist.txt)

(The -f option means that grep searches for all the patterns in the given file.)

Explanation of <(sed 's@^@/@; s/$/$/; s/$[\.[\*]\|\]$/\\\1/g' filelist.txt):

The <( ... ) is called a process subsitution, and is a little similar to $( ... ). The situation is equivalent to (but using the process substitution is neater and possibly a little faster):

sed 's@^@/@; s/$/$/; s/\([\.[\*]\|\]\)/\\\1/g' filelist.txt > processed_filelist.txt
find /dir | grep -f processed_filelist.txt

The call to sed runs the commands s@^@/@, s/$/$/ and s/$[\.[\*]\|\]$/\\\1/g on each line of filelist.txt and prints them out. These commands convert the filenames into a format that will work better with grep.

s@^@/@ means put a / at the before each filename. (The ^ means "start of line" in a regex)
s/$/$/ means put a $ at the end of each filename. (The first $ means "end of line", the second is just a literal $ which is then interpreted by grep to mean "end of line").

The combination of these two rules means that grep will only look for matches like .../<filename>, so that a.txt doesn't match ./a.txt.backup or ./abba.txt.

s/$[\.[\*]\|\]$/\\\1/g puts a \ before each occurrence of . [ ] or *. Grep uses regexes and those characters are considered special, but we want them to be plain so we need to escape them (if we didn't escape them, then a file name like a.txt would match files like abtxt).

As an example:

$ cat filelist.txt
file1.txt
file2.txt
blah[2012].txt
blah[2011].txt
lastfile

$ sed 's@^@/@; s/$/$/; s/\([\.[\*]\|\]\)/\\\1/g' filelist.txt
/file1\.txt$
/file2\.txt$
/blah\[2012\]\.txt$
/blah\[2011\]\.txt$
/lastfile$

Grep then uses each line of that output as a pattern when it is searching the output of find.

这篇关于Shell:在目录下的列表中查找文件的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

Shell:在目录下的列表中查找文件 [英] Shell: find files in a list under a directory

问题描述

相关文章

服务器开发最新文章

热门教程

热门工具

登录关闭

Shell:在目录下的列表中查找文件 [英] Shell: find files in a list under a directory

问题描述

相关文章

服务器开发最新文章

热门教程

热门工具

登录 关闭

登录关闭