Amazon Redshift-卸载到S3-动态S3文件名 [英] Amazon Redshift - Unload to S3 - Dynamic S3 file name
问题描述
我已经在Redshift中使用 UNLOAD
语句已有一段时间了,它使将文件转储到 S3 $ c $更容易。 c>然后允许人们进行分析。
I have been using UNLOAD
statement in Redshift for a while now, it makes it easier to dump the file to S3
and then allow people to analysie.
现在是时候尝试使其自动化。我们正在运行 Amazon Data Pipeline
来执行多个任务,我想运行 SQLActivity
来执行 UNLOAD
自动。我使用托管在 S3
中的 SQL
脚本。
The time has come to try to automate it. We have Amazon Data Pipeline
running for several tasks and I wanted to run SQLActivity
to execute UNLOAD
automatically. I use SQL
script hosted in S3
.
查询本身是正确的,但是我一直试图找出的是如何动态分配文件名。例如:
The query itself is correct but what I have been trying to figure out is how can I dynamically assign the name of the file. For example:
UNLOAD('<the_query>')
TO 's3://my-bucket/' || to_char(current_date)
WITH CREDENTIALS '<credentials>'
ALLOWOVERWRITE
PARALLEL OFF
不起作用,当然我怀疑您无法在 TO 中执行函数(
行。还有其他方法吗? to_char
)
doesn't work and of course I suspect that you can't execute functions (to_char
) in the "TO
" line. Is there any other way I can do it?
如果 UNLOAD
不是,我有什么办法吗?其他选项如何使用当前可用的基础架构自动执行此类任务( Redshift
+ S3
+ 数据管道
,我们的 Amazon EMR
尚未激活)。
And if UNLOAD
is not the way, do I have any other options how to automate such tasks with current available infrastructure (Redshift
+ S3
+ Data Pipeline
, our Amazon EMR
is not active yet).
我唯一的一件事认为可行(但不确定)不是使用脚本,而是将脚本复制到 SQLActivity
Script 选项中>(当前指向文件)并引用 {@ ScheduleStartTime}
The only thing that I thought could work (but not sure) is not instead of using script, to copy the script into the Script
option in SQLActivity
(at the moment it points to a file) and reference {@ScheduleStartTime}
推荐答案
为什么不使用RedshiftCopyActivity从Redshift复制到S3?输入是RedshiftDataNode,输出是S3DataNode,您可以在其中指定directoryPath的表达式。
Why not use RedshiftCopyActivity to copy from Redshift to S3? Input is RedshiftDataNode and output is S3DataNode where you can specify expression for directoryPath.
您还可以在RedshiftCopyActivity中指定transformSql属性,以覆盖默认值:select * from + inputRedshiftTable。
You can also specify the transformSql property in RedshiftCopyActivity to override the default value of : select * from + inputRedshiftTable.
示例管道:
{
对象:[{
id: CSVId1,
name: DefaultCSV1,
type: CSV
},{
id: RedshiftDatabaseId1,
databaseName: dbname,
username: user,
name: DefaultRedshiftDatabase1,
* password: password,
type: RedshiftDatabase,
clusterId: redshiftclusterId
},{
id:默认,
scheduleType:时间序列,
failureAndRerunMode: CASCADE,
名称:默认,
角色: DataPipelineDefaultRole,
resourceRole: DataPipelineDefaultR esourceRole
},{
id: RedshiftDataNodeId1,
schedule:{
ref: ScheduleId1
},
tableName:订单,
name: DefaultRedshiftDataNode1,
type: RedshiftDataNode,
database:{
ref: RedshiftDatabaseId1
}
},{
id: Ec2ResourceId1,
schedule:{
ref: ScheduleId1
},
securityGroups: MySecurityGroup,
name: DefaultEc2Resource1,
role: DataPipelineDefaultRole,
logUri: s3: // myLogs,
resourceRole: DataPipelineDefaultResourceRole,
type: Ec2Resource
},{
myComment:此对象用于控制任务计划。,
id: DefaultSchedule1,
name: RunOnce,
ocences: 1,
period: 1 Da y,
type:时间表,
startAt: FIRST_ACTIVATION_DATE_TIME
},{
id: S3DataNodeId1,
schedule:{
ref: ScheduleId1
},
directoryPath: s3:// my-bucket /#{format(@scheduledStartTime,'YYYY-MM- dd-HH-mm-ss')},
name: DefaultS3DataNode1,
dataFormat:{
ref: CSVId1
},
type: S3DataNode
},{
id: RedshiftCopyActivityId1,
output:{
ref: S3DataNodeId1
},
input:{
ref: RedshiftDataNodeId1
},
schedule:{
ref: ScheduleId1
},
name: DefaultRedshiftCopyActivity1,
runsOn:{
ref: Ec2ResourceId1
},
type: RedshiftCopyActivity
}]
}
{ "objects": [{ "id": "CSVId1", "name": "DefaultCSV1", "type": "CSV" }, { "id": "RedshiftDatabaseId1", "databaseName": "dbname", "username": "user", "name": "DefaultRedshiftDatabase1", "*password": "password", "type": "RedshiftDatabase", "clusterId": "redshiftclusterId" }, { "id": "Default", "scheduleType": "timeseries", "failureAndRerunMode": "CASCADE", "name": "Default", "role": "DataPipelineDefaultRole", "resourceRole": "DataPipelineDefaultResourceRole" }, { "id": "RedshiftDataNodeId1", "schedule": { "ref": "ScheduleId1" }, "tableName": "orders", "name": "DefaultRedshiftDataNode1", "type": "RedshiftDataNode", "database": { "ref": "RedshiftDatabaseId1" } }, { "id": "Ec2ResourceId1", "schedule": { "ref": "ScheduleId1" }, "securityGroups": "MySecurityGroup", "name": "DefaultEc2Resource1", "role": "DataPipelineDefaultRole", "logUri": "s3://myLogs", "resourceRole": "DataPipelineDefaultResourceRole", "type": "Ec2Resource" }, { "myComment": "This object is used to control the task schedule.", "id": "DefaultSchedule1", "name": "RunOnce", "occurrences": "1", "period": "1 Day", "type": "Schedule", "startAt": "FIRST_ACTIVATION_DATE_TIME" }, { "id": "S3DataNodeId1", "schedule": { "ref": "ScheduleId1" }, "directoryPath": "s3://my-bucket/#{format(@scheduledStartTime, 'YYYY-MM-dd-HH-mm-ss')}", "name": "DefaultS3DataNode1", "dataFormat": { "ref": "CSVId1" }, "type": "S3DataNode" }, { "id": "RedshiftCopyActivityId1", "output": { "ref": "S3DataNodeId1" }, "input": { "ref": "RedshiftDataNodeId1" }, "schedule": { "ref": "ScheduleId1" }, "name": "DefaultRedshiftCopyActivity1", "runsOn": { "ref": "Ec2ResourceId1" }, "type": "RedshiftCopyActivity" }] }
这篇关于Amazon Redshift-卸载到S3-动态S3文件名的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!