如何使用AWS CLI仅复制S3存储桶中与给定字符串模式匹配的文件 [英] How to use AWS CLI to only copy files in S3 bucket that match a given string pattern

查看:190
本文介绍了如何使用AWS CLI仅复制S3存储桶中与给定字符串模式匹配的文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用AWS CLI使用以下命令将文件从S3存储桶复制到我的R计算机:

I'm using the AWS CLI to copy files from an S3 bucket to my R machine using a command like below:

  system(
    "aws s3 cp s3://my_bucket_location/ ~/my_r_location/ --recursive --exclude '*' --include '*trans*' --region us-east-1"
    )

这可以按预期工作,即,复制my_bucket_location中该位置文件名中带有"trans"的所有文件.

This works as expected, i.e. it copies all files in my_bucket_location that have "trans" in the filename at that location.

我面临的问题是我有其他具有类似命名约定的文件,但我不想在此步骤中导入.例如,在下面的列表中,我只想复制前两个文件,而不是后两个文件:

The problem that I am facing is that I have other files with similar naming conventions that I don't want to import in this step. As an example, in the list below I only want to copy the first two files, not the last two:

File list
trans_120215.csv
trans_130215.csv
sum_trans_120215.csv
sum_trans_130215.csv

如果我使用的是正则表达式,则可以像"^trans_\\d+"这样更具体地引入前两个文件,但是使用AWS CLI似乎不可能.所以我的问题是,有没有办法像下面这样使用AWS CLI进行更复杂的模式匹配?

If I was using regex I could make it more specific like "^trans_\\d+" to bring in just the first two files, but this doesn't seem possible using AWS CLI. So my question is there a way to have more complex pattern matching using AWS CLI like below?

  system(
    "aws s3 cp s3://my_bucket_location/ ~/my_r_location/ --recursive --exclude '*' --include '^trans_\\d+' --region us-east-1"
    )

请注意,我只能使用有关文件的信息,即我想导入模式为"^trans_\\d+"的文件,我不能使用其他不需要的文件开头包含sum_的事实,因为这只是一个例子,可能还有其他类似名称的文件,例如"check_trans_120215.csv".

Please note that I can only use information about the file in question, i.e. that I want to import a file with pattern "^trans_\\d+", I can't use the fact that the other unwanted files contain sum_ at the start, because this is only an example there could be other files with similar names like "check_trans_120215.csv".

我已经考虑过其他类似的替代方法,但是希望有一种方法可以调整复制命令,以免出现以下两种情况:

I have considered other alternatives like below, but hoping there is a way to adjust the copy command to avoid going down either of these routes:

  • 列出存储桶中的所有项目>在R中使用正则表达式指定我想要的文件>仅导入这些文件
  • 按原样保留复制命令>复制后在R机器上删除不需要的文件

推荐答案

您列出的替代方法是最好的选择,因为S3 CLI不支持regex.

The alternatives that you have listed are the best options because S3 CLI doesn't support regex.

使用排除和包含过滤器:

当前,不支持在UNIX中使用UNIX样式通配符 命令的路径参数.但是,大多数命令都具有--exclude "和--include"参数可以实现 预期的结果.这些参数执行模式匹配 排除或包括特定文件或对象.以下模式 支持符号.

Currently, there is no support for the use of UNIX style wildcards in a command's path arguments. However, most commands have --exclude "" and --include "" parameters that can achieve the desired result. These parameters perform pattern matching to either exclude or include a particular file or object. The following pattern symbols are supported.

*: Matches everything
?: Matches any single character
[sequence]: Matches any character in sequence
[!sequence]: Matches any character not in sequence

这篇关于如何使用AWS CLI仅复制S3存储桶中与给定字符串模式匹配的文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆