自动化蜂巢Activiy使用AWS [英] Automating Hive Activiy using aws
问题描述
我想每天我的自动化脚本蜂巢,为了做到这一点,我有一个选项是数据管道。但问题是存在的,我是从出口发电机-DB数据S3和蜂巢的脚本,我操纵这些数据。我给这个输入和输出的蜂巢脚本这就是问题的开始,因为一个蜂房的活动必须有输入和输出,但我必须给他们的脚本文件。
I would like to automate my hive script every day , in order to do that i have an option which is data pipeline. But the problem is there that i am exporting data from dynamo-db to s3 and with a hive script i am manipulating this data. I am giving this input and output in hive-script that's where the problem starts because a hive-activity has to have input and output but i have to give them in script file.
我试图找到一种方法来自动完成这个蜂巢脚本并等待一些想法?
I am trying to find a way to automate this hive-script and waiting for some ideas ?
干杯,
推荐答案
您可以禁用分段蜂巢活动运行任意蜂巢脚本。
You can disable staging on Hive Activity to run any arbitrary Hive Script.
stage = false
做这样的事情:
Do something like:
{
"name": "DefaultActivity1",
"id": "ActivityId_1",
"type": "HiveActivity",
"stage": "false",
"scriptUri": "s3://baucket/query.hql",
"scriptVariable": [
"param1=value1",
"param2=value2"
],
"schedule": {
"ref": "ScheduleId_l"
},
"runsOn": {
"ref": "EmrClusterId_1"
}
},
这篇关于自动化蜂巢Activiy使用AWS的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!