从管道AWS删除s3文件 [英] delete s3 files from a pipeline AWS

查看:140
本文介绍了从管道AWS删除s3文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想问一下我正在尝试使用AWS中的数据管道来完成的处理任务,但是我无法使其正常工作.

I would like to ask about a processing task I am trying to complete using a data pipeline in AWS, but I have not been able to get it to work.

基本上,我有2个代表2个MySQL数据库的数据节点,应该定期从中提取数据并将其放在S3存储桶中.每天选择添加的每一行(例如今天-1天),此复制活动都可以正常进行.

Basically, I have 2 data nodes representing 2 MySQL databases, where the data is supposed to be extracted from periodically and placed in an S3 bucket. This copy activity is working fine selecting daily every row that has been added, let's say today - 1 day.

但是,包含收集的数据作为CSV的存储桶应该成为EMR活动的输入,该活动将处理这些文件并汇总信息.问题是我不知道如何删除或移动已经处理过的文件到另一个存储桶,因此我不必每天都处理所有文件.

However, that bucket containing the collected data as CSVs should become the input for an EMR activity, which will be processing those files and aggregating the information. The problem is that I do not know how to remove or move the already processed files to a different bucket so I do not have to process all the files every day.

为澄清起见,我正在寻找一种方法来从管道中移动或删除S3存储桶中已处理的文件.我可以那样做吗?还有其他方法只能基于命名约定或其他方式处理EMR活动中的某些文件吗?

To clarify, I am looking for a way to move or remove already processed files in an S3 bucket from a pipeline. Can I do that? Is there any other way I can only process some files in an EMR activity based on a naming convention or something else?

推荐答案

更好的是,创建一个DataPipeline ShellCommandActivity并使用aws命令行工具.

Even better, create a DataPipeline ShellCommandActivity and use the aws command line tools.

使用以下两行创建脚本:

Create a script with these two lines:

    sudo yum -y upgrade aws-cli 
    aws s3 rm $1 --recursive

第一行确保您拥有最新的aws工具.

The first line ensures you have the latest aws tools.

第二个删除目录及其所有内容. $ 1是传递给脚本的参数.

The second one removes a directory and all its contents. The $1 is an argument passed to the script.

在您的ShellCommandActivity中:

In your ShellCommandActivity:

    "scriptUri": "s3://myBucket/scripts/theScriptAbove.sh",
    "scriptArgument": "s3://myBucket/myDirectoryToBeDeleted"

有关aws s3命令如何工作的详细信息,请参见:

The details on how the aws s3 command works are at:

    http://docs.aws.amazon.com/cli/latest/reference/s3/index.html

这篇关于从管道AWS删除s3文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆