Hive执行钩子 [英] Hive execution hook
问题描述
我需要在Apache Hive中挂接一个自定义执行钩子。请让我知道如果有人知道如何做到这一点。
当前使用的环境如下:
Hadoop:Cloudera 4.1.2版本
操作系统:Centos
感谢,
Arun
<有几种类型的钩子,取决于你想在哪个阶段注入自定义代码:
- 驱动程序运行钩子(Pre / Post )
- 语义分析器钩子(前/后)
- 执行钩子(前/失败/后期) >客户统计发布者
如果您运行脚本,处理流程如下所示:
- Driver.run()接收命令
HiveDriverRunHook.preDriverRun()
<
(HiveConf.ConfVars.HIVE_DRIVER_RUN_HOOKS
)
- Driver.compile()开始处理命令:创建摘要语法树
AbstractSemanticAnalyzerHook.preAnalyze()
(HiveCon语义分析
AbstractSemanticAnalyzerHook.postAnalyze()
code> - 创建并验证查询计划物理计划)
- Driver.execute():准备好运行作业
ExecuteWithHookContext.run()
(HiveConf.ConfVars.PREEXECHOOKS
)
- ExecDriver.execute()作业
- 对于每个HiveConf.ConfVars.HIVECOUNTERSPULLINTERVAL区间的每个作业:
ClientStatsPublisher.run()
是调用以发布
(HiveConf.ConfVars.CLIENTSTATSPUBLISHERS
)
如果任务失败:ExecuteWithHookContext .run()
(HiveConf.ConfVars.ONFAILUREHOOKS
)
- 完成所有任务
ExecuteWithHookContext.run()
(HiveConf.ConfVars.POSTEXECHOOKS
- 在返回结果
HiveDriverRunHook.postDriverRun()
(HiveConf.ConfVars .HIVE_DRIVER_RUN_HOOKS
)
- 返回结果。
(
HiveConf.ConfVars.SEMANTIC_ANALYZER_HOOK
) 我指出了你必须实现的接口。在括号
中有相应的conf。支柱。键必须设置为在脚本的开头注册
类。
例如:设置PreExecution钩子(工作流程的第9个阶段)
HiveConf.ConfVars.PREEXECHOOKS - > hive.exec.pre.hooks:
set hive.exec.pre.hooks = com.example.MyPreHook;
不幸的是,这些功能没有真正记录,但您可以随时查看 Driver class看看评价顺序的钩子。
备注:我在这里假设Hive 0.11.0,我不认为Cloudera的分布
不同很多)
I am in need to hook a custom execution hook in Apache Hive. Please let me know if somebody know how to do it.
The current environment I am using is given below:
Hadoop : Cloudera version 4.1.2 Operating system : Centos
Thanks, Arun
There are several types of hooks depending on at which stage you want to inject your custom code:
- Driver run hooks (Pre/Post)
- Semantic analyizer hooks (Pre/Post)
- Execution hooks (Pre/Failure/Post)
- Client statistics publisher
If you run a script the processing flow looks like as follows:
- Driver.run() takes the command
HiveDriverRunHook.preDriverRun()
(HiveConf.ConfVars.HIVE_DRIVER_RUN_HOOKS
)- Driver.compile() starts processing the command: creates the abstract syntax tree
AbstractSemanticAnalyzerHook.preAnalyze()
(HiveConf.ConfVars.SEMANTIC_ANALYZER_HOOK
)- Semantic analysis
AbstractSemanticAnalyzerHook.postAnalyze()
(HiveConf.ConfVars.SEMANTIC_ANALYZER_HOOK
)- Create and validate the query plan (physical plan)
- Driver.execute() : ready to run the jobs
ExecuteWithHookContext.run()
(HiveConf.ConfVars.PREEXECHOOKS
)- ExecDriver.execute() runs all the jobs
- For each job at every HiveConf.ConfVars.HIVECOUNTERSPULLINTERVAL interval:
ClientStatsPublisher.run()
is called to publish statistics
(HiveConf.ConfVars.CLIENTSTATSPUBLISHERS
)
If a task fails:ExecuteWithHookContext.run()
(HiveConf.ConfVars.ONFAILUREHOOKS
) - Finish all the tasks
ExecuteWithHookContext.run()
(HiveConf.ConfVars.POSTEXECHOOKS
)- Before returning the result
HiveDriverRunHook.postDriverRun()
(HiveConf.ConfVars.HIVE_DRIVER_RUN_HOOKS
) - Return the result.
For each of the hooks I indicated the interfaces you have to implement. In the brackets there's the corresponding conf. prop. key you have to set in order to register the class at the beginning of the script. E.g: setting the PreExecution hook (9th stage of the workflow)
HiveConf.ConfVars.PREEXECHOOKS -> hive.exec.pre.hooks :
set hive.exec.pre.hooks=com.example.MyPreHook;
Unfortunately these features aren't really documented, but you can always look into the Driver class to see the evaluation order of the hooks.
Remark: I assumed here Hive 0.11.0, I don't think that the Cloudera distribution differs (too much)
这篇关于Hive执行钩子的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!