如何从S3 grep术语和输出对象名称 [英] How to grep a term from S3 and output object name
问题描述
我需要对S3中成千上万个文件使用一个术语,并在某些输出文件中列出这些文件名.我使用cli很新,所以我已经在本地和s3的一小部分中进行了测试.
I need to grep a term over thousands of files in S3, and list those file names in some output file. I'm quite new using cli, so I've been testing both on my local, and in a small subset in s3.
到目前为止,我已经知道了:
So far I've got this:
aws s3 cp s3://mybucket/path/to/file.csv - | grep -iln searchterm > output.txt
问题在于连字符.由于我要复制到标准输出,因此grep中的-l开关返回(标准输入)而不是file.csv
The problem with this is with the hyphen. Since I'm copying over to standard output, the -l switch in grep returns (standard input) instead of file.csv
我想要的输出是
file.csv
最终,我需要遍历整个存储桶,然后对所有存储桶进行迭代,以获取
Eventually, I'll need to iterate this over the whole bucket, and then all buckets, to get
file1.csv
file2.csv
file3.csv
但是我需要首先克服这个障碍.谢谢!
But I need to get over this hurdle first. Thanks!
推荐答案
由于您在STDOUT中打印文件并将其通过管道传输到grep STDIN,因此grep不知道原始文件是 file.csv
.如果文件列表很长,我会这样做:
Because you print the file in STDOUT and pipe that to grep STDIN, grep has no idea that the original file was file.csv
. If you have a long list of files, I would do:
while read -r file; do aws s3 cp s3://mybucket/path/to/${file} - | grep -q searchterm && { echo ${file} >> output.txt; }; done < files_list.txt
我无法尝试,因为我无权访问AWS S3实例,但是诀窍是安静地使用grep( -q
),如果找到至少一个a,它将返回true.匹配,否则返回false;然后,您可以打印文件的名称.
I cannot try it, because I do not have access to an AWS S3 instance, but the trick is to use grep quietly (-q
), it will return true if it finds at least a match, false otherwise; Then you can print the name of the file.
- while循环将遍历
files_list.txt
的每一行 -
aws
命令将在stdout
中打印此文件 - 我们在安静模式(
-q
)中将stdout
重定向到grep
,该模式用作模式匹配器,如果找到匹配项,则返回true,更明智的选择. - 如果grep返回true,则将文件名(
$ {file}
)附加到输出文件中.
- The while loop will iterate over each line of
files_list.txt
- The
aws
command will print this file instdout
- We redirect
stdout
togrep
in quiet mode (-q
) which acts as a pattern matcher, returning true if a match was found, false ohter wise. - If grep returns true, we append the name of the file (
${file}
) to our output file.
其他解决方案
while read -r file; do aws s3 cp s3://mybucket/path/to/${file} - | sed -n /searchpattern/{F;q} >> output.txt; done < files_list.txt
说明
第1步和第2步相同,然后:
Explanation
Steps 1 and 2 are the same, then:
-
stdout
重定向到sed,它将逐行查找文件,直到找到第一个stream模式
,然后退出(q
),在输出文件中打印文件名(F
).
stdout
is redirected to sed, which will look in the file line by line until it finds the firststream pattern
, and then quit (q
), printing the file name (F
) in the output file.
这篇关于如何从S3 grep术语和输出对象名称的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!