Amazon Redshift-卸载到S3-动态S3文件名 [英] Amazon Redshift - Unload to S3 - Dynamic S3 file name

查看:156
本文介绍了Amazon Redshift-卸载到S3-动态S3文件名的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我已经在Redshift中使用 UNLOAD 语句已有一段时间了,它使将文件转储到 S3 然后允许人们进行分析。

I have been using UNLOAD statement in Redshift for a while now, it makes it easier to dump the file to S3 and then allow people to analysie.

现在是时候尝试使其自动化。我们正在运行 Amazon Data Pipeline 来执行多个任务,我想运行 SQLActivity 来执行 UNLOAD 自动。我使用托管在 S3 中的 SQL 脚本。

The time has come to try to automate it. We have Amazon Data Pipeline running for several tasks and I wanted to run SQLActivity to execute UNLOAD automatically. I use SQL script hosted in S3.

查询本身是正确的,但是我一直试图找出的是如何动态分配文件名。例如:

The query itself is correct but what I have been trying to figure out is how can I dynamically assign the name of the file. For example:

UNLOAD('<the_query>')
TO 's3://my-bucket/' || to_char(current_date)
WITH CREDENTIALS '<credentials>'
ALLOWOVERWRITE
PARALLEL OFF

不起作用,当然我怀疑您无法在 TO 中执行函数( to_char 行。还有其他方法吗?

doesn't work and of course I suspect that you can't execute functions (to_char) in the "TO" line. Is there any other way I can do it?

如果 UNLOAD 不是,我有什么办法吗?其他选项如何使用当前可用的基础架构自动执行此类任务( Redshift + S3 + 数据管道,我们的 Amazon EMR 尚未激活)。

And if UNLOAD is not the way, do I have any other options how to automate such tasks with current available infrastructure (Redshift + S3 + Data Pipeline, our Amazon EMR is not active yet).

我唯一的一件事认为可行(但不确定)不是使用脚本,而是将脚本复制到 SQLActivity Script 选项中>(当前指向文件)并引用 {@ ScheduleStartTime}

The only thing that I thought could work (but not sure) is not instead of using script, to copy the script into the Script option in SQLActivity (at the moment it points to a file) and reference {@ScheduleStartTime}

推荐答案

为什么不使用RedshiftCopyActivity从Redshift复制到S3?输入是RedshiftDataNode,输出是S3DataNode,您可以在其中指定directoryPath的表达式。

Why not use RedshiftCopyActivity to copy from Redshift to S3? Input is RedshiftDataNode and output is S3DataNode where you can specify expression for directoryPath.

您还可以在RedshiftCopyActivity中指定transformSql属性,以覆盖默认值:select * from + inputRedshiftTable。

You can also specify the transformSql property in RedshiftCopyActivity to override the default value of : select * from + inputRedshiftTable.

示例管道:

{
对象:[{
id: CSVId1,
name: DefaultCSV1,
type: CSV
},{
id: RedshiftDatabaseId1,
databaseName: dbname,
username: user,
name: DefaultRedshiftDatabase1,
* password: password,
type: RedshiftDatabase,
clusterId: redshiftclusterId
},{
id:默认,
scheduleType:时间序列,
failureAndRerunMode: CASCADE,
名称:默认,
角色: DataPipelineDefaultRole,
resourceRole: DataPipelineDefaultR esourceRole
},{
id: RedshiftDataNodeId1,
schedule:{
ref: ScheduleId1
},
tableName:订单,
name: DefaultRedshiftDataNode1,
type: RedshiftDataNode,
database:{
ref: RedshiftDatabaseId1
}
},{
id: Ec2ResourceId1,
schedule:{
ref: ScheduleId1
},
securityGroups: MySecurityGroup,
name: DefaultEc2Resource1,
role: DataPipelineDefaultRole,
logUri: s3: // myLogs,
resourceRole: DataPipelineDefaultResourceRole,
type: Ec2Resource
},{
myComment:此对象用于控制任务计划。,
id: DefaultSchedule1,
name: RunOnce,
ocences: 1,
period: 1 Da y,
type:时间表,
startAt: FIRST_ACTIVATION_DATE_TIME
},{
id: S3DataNodeId1,
schedule:{
ref: ScheduleId1
},
directoryPath: s3:// my-bucket /#{format(@scheduledStartTime,'YYYY-MM- dd-HH-mm-ss')},
name: DefaultS3DataNode1,
dataFormat:{
ref: CSVId1
},
type: S3DataNode
},{
id: RedshiftCopyActivityId1,
output:{
ref: S3DataNodeId1
},
input:{
ref: RedshiftDataNodeId1
},
schedule:{
ref: ScheduleId1
},
name: DefaultRedshiftCopyActivity1,
runsOn:{
ref: Ec2ResourceId1
},
type: RedshiftCopyActivity
}]
}

{ "objects": [{ "id": "CSVId1", "name": "DefaultCSV1", "type": "CSV" }, { "id": "RedshiftDatabaseId1", "databaseName": "dbname", "username": "user", "name": "DefaultRedshiftDatabase1", "*password": "password", "type": "RedshiftDatabase", "clusterId": "redshiftclusterId" }, { "id": "Default", "scheduleType": "timeseries", "failureAndRerunMode": "CASCADE", "name": "Default", "role": "DataPipelineDefaultRole", "resourceRole": "DataPipelineDefaultResourceRole" }, { "id": "RedshiftDataNodeId1", "schedule": { "ref": "ScheduleId1" }, "tableName": "orders", "name": "DefaultRedshiftDataNode1", "type": "RedshiftDataNode", "database": { "ref": "RedshiftDatabaseId1" } }, { "id": "Ec2ResourceId1", "schedule": { "ref": "ScheduleId1" }, "securityGroups": "MySecurityGroup", "name": "DefaultEc2Resource1", "role": "DataPipelineDefaultRole", "logUri": "s3://myLogs", "resourceRole": "DataPipelineDefaultResourceRole", "type": "Ec2Resource" }, { "myComment": "This object is used to control the task schedule.", "id": "DefaultSchedule1", "name": "RunOnce", "occurrences": "1", "period": "1 Day", "type": "Schedule", "startAt": "FIRST_ACTIVATION_DATE_TIME" }, { "id": "S3DataNodeId1", "schedule": { "ref": "ScheduleId1" }, "directoryPath": "s3://my-bucket/#{format(@scheduledStartTime, 'YYYY-MM-dd-HH-mm-ss')}", "name": "DefaultS3DataNode1", "dataFormat": { "ref": "CSVId1" }, "type": "S3DataNode" }, { "id": "RedshiftCopyActivityId1", "output": { "ref": "S3DataNodeId1" }, "input": { "ref": "RedshiftDataNodeId1" }, "schedule": { "ref": "ScheduleId1" }, "name": "DefaultRedshiftCopyActivity1", "runsOn": { "ref": "Ec2ResourceId1" }, "type": "RedshiftCopyActivity" }] }

这篇关于Amazon Redshift-卸载到S3-动态S3文件名的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆