如何从S3 grep术语和输出对象名称 [英] How to grep a term from S3 and output object name

查看:208
本文介绍了如何从S3 grep术语和输出对象名称的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要对S3中成千上万个文件使用一个术语,并在某些输出文件中列出这些文件名.我使用cli很新,所以我已经在本地和s3的一小部分中进行了测试.

I need to grep a term over thousands of files in S3, and list those file names in some output file. I'm quite new using cli, so I've been testing both on my local, and in a small subset in s3.

到目前为止,我已经知道了:

So far I've got this:

aws s3 cp s3://mybucket/path/to/file.csv - | grep -iln searchterm > output.txt

问题在于连字符.由于我要复制到标准输出,因此grep中的-l开关返回(标准输入)而不是file.csv

The problem with this is with the hyphen. Since I'm copying over to standard output, the -l switch in grep returns (standard input) instead of file.csv

我想要的输出是

file.csv

最终,我需要遍历整个存储桶,然后对所有存储桶进行迭代,以获取

Eventually, I'll need to iterate this over the whole bucket, and then all buckets, to get

file1.csv
file2.csv
file3.csv

但是我需要首先克服这个障碍.谢谢!

But I need to get over this hurdle first. Thanks!

推荐答案

由于您在STDOUT中打印文件并将其通过管道传输到grep STDIN,因此grep不知道原始文件是 file.csv .如果文件列表很长,我会这样做:

Because you print the file in STDOUT and pipe that to grep STDIN, grep has no idea that the original file was file.csv. If you have a long list of files, I would do:

while read -r file; do aws s3 cp s3://mybucket/path/to/${file} - | grep -q searchterm && { echo ${file} >> output.txt; }; done < files_list.txt

我无法尝试,因为我无权访问AWS S3实例,但是诀窍是安静地使用grep( -q ),如果找到至少一个a,它将返回true.匹配,否则返回false;然后,您可以打印文件的名称.

I cannot try it, because I do not have access to an AWS S3 instance, but the trick is to use grep quietly (-q), it will return true if it finds at least a match, false otherwise; Then you can print the name of the file.

  1. while循环将遍历 files_list.txt
  2. 的每一行
  3. aws 命令将在 stdout
  4. 中打印此文件
  5. 我们在安静模式( -q )中将 stdout 重定向到 grep ,该模式用作模式匹配器,如果找到匹配项,则返回true,更明智的选择.
  6. 如果grep返回true,则将文件名( $ {file} )附加到输出文件中.
  1. The while loop will iterate over each line of files_list.txt
  2. The aws command will print this file in stdout
  3. We redirect stdout to grep in quiet mode (-q) which acts as a pattern matcher, returning true if a match was found, false ohter wise.
  4. If grep returns true, we append the name of the file (${file}) to our output file.


其他解决方案

while read -r file; do aws s3 cp s3://mybucket/path/to/${file} - | sed -n /searchpattern/{F;q} >> output.txt; done < files_list.txt

说明

第1步和第2步相同,然后:

Explanation

Steps 1 and 2 are the same, then:

  1. stdout 重定向到sed,它将逐行查找文件,直到找到第一个 stream模式,然后退出( q ),在输出文件中打印文件名( F ).
  1. stdout is redirected to sed, which will look in the file line by line until it finds the first stream pattern, and then quit (q), printing the file name (F) in the output file.

这篇关于如何从S3 grep术语和输出对象名称的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆