Hive执行钩子 [英] Hive execution hook

查看:175
本文介绍了Hive执行钩子的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要在Apache Hive中挂接一个自定义执行钩子。请让我知道如果有人知道如何做到这一点。



当前使用的环境如下:

Hadoop:Cloudera 4.1.2版本
操作系统:Centos



感谢,
Arun

解决方案

<有几种类型的钩子,取决于你想在哪个阶段注入自定义代码:




  • 驱动程序运行钩子(Pre / Post )

  • 语义分析器钩子(前/后)

  • 执行钩子(前/失败/后期) >客户统计发布者



如果您运行脚本,处理流程如下所示:


  1. Driver.run()接收命令
  2. HiveDriverRunHook.preDriverRun() <
    HiveConf.ConfVars.HIVE_DRIVER_RUN_HOOKS

  3. Driver.compile()开始处理命令:创建摘要语法树

  4. AbstractSemanticAnalyzerHook.preAnalyze()

    HiveCon语义分析
  5. AbstractSemanticAnalyzerHook.postAnalyze()
  6. code>

    HiveConf.ConfVars.SEMANTIC_ANALYZER_HOOK
  7. 创建并验证查询计划物理计划)

  8. Driver.execute():准备好运行作业
  9. ExecuteWithHookContext.run()

    HiveConf.ConfVars.PREEXECHOOKS

  10. ExecDriver.execute()作业

  11. 对于每个HiveConf.ConfVars.HIVECOUNTERSPULLINTERVAL区间的每个作业:

    ClientStatsPublisher.run()是调用以发布
    HiveConf.ConfVars.CLIENTSTATSPUBLISHERS

    如果任务失败: ExecuteWithHookContext .run()

    HiveConf.ConfVars.ONFAILUREHOOKS

  12. 完成所有任务

  13. ExecuteWithHookContext.run()
    HiveConf.ConfVars.POSTEXECHOOKS
  14. 在返回结果 HiveDriverRunHook.postDriverRun()
    HiveConf.ConfVars .HIVE_DRIVER_RUN_HOOKS

  15. 返回结果。

我指出了你必须实现的接口。在括号
中有相应的conf。支柱。键必须设置为在脚本的开头注册
类。
例如:设置PreExecution钩子(工作流程的第9个阶段)

  HiveConf.ConfVars.PREEXECHOOKS  - > hive.exec.pre.hooks:
set hive.exec.pre.hooks = com.example.MyPreHook;

不幸的是,这些功能没有真正记录,但您可以随时查看 Driver class看看评价顺序的钩子。



备注:我在这里假设Hive 0.11.0,我不认为Cloudera的分布
不同很多)


I am in need to hook a custom execution hook in Apache Hive. Please let me know if somebody know how to do it.

The current environment I am using is given below:

Hadoop : Cloudera version 4.1.2 Operating system : Centos

Thanks, Arun

解决方案

There are several types of hooks depending on at which stage you want to inject your custom code:

  • Driver run hooks (Pre/Post)
  • Semantic analyizer hooks (Pre/Post)
  • Execution hooks (Pre/Failure/Post)
  • Client statistics publisher

If you run a script the processing flow looks like as follows:

  1. Driver.run() takes the command
  2. HiveDriverRunHook.preDriverRun()
    (HiveConf.ConfVars.HIVE_DRIVER_RUN_HOOKS)
  3. Driver.compile() starts processing the command: creates the abstract syntax tree
  4. AbstractSemanticAnalyzerHook.preAnalyze()
    (HiveConf.ConfVars.SEMANTIC_ANALYZER_HOOK)
  5. Semantic analysis
  6. AbstractSemanticAnalyzerHook.postAnalyze()
    (HiveConf.ConfVars.SEMANTIC_ANALYZER_HOOK)
  7. Create and validate the query plan (physical plan)
  8. Driver.execute() : ready to run the jobs
  9. ExecuteWithHookContext.run()
    (HiveConf.ConfVars.PREEXECHOOKS)
  10. ExecDriver.execute() runs all the jobs
  11. For each job at every HiveConf.ConfVars.HIVECOUNTERSPULLINTERVAL interval:
    ClientStatsPublisher.run() is called to publish statistics
    (HiveConf.ConfVars.CLIENTSTATSPUBLISHERS)
    If a task fails: ExecuteWithHookContext.run()
    (HiveConf.ConfVars.ONFAILUREHOOKS)
  12. Finish all the tasks
  13. ExecuteWithHookContext.run()
    (HiveConf.ConfVars.POSTEXECHOOKS)
  14. Before returning the result HiveDriverRunHook.postDriverRun()
    ( HiveConf.ConfVars.HIVE_DRIVER_RUN_HOOKS)
  15. Return the result.

For each of the hooks I indicated the interfaces you have to implement. In the brackets there's the corresponding conf. prop. key you have to set in order to register the class at the beginning of the script. E.g: setting the PreExecution hook (9th stage of the workflow)

HiveConf.ConfVars.PREEXECHOOKS -> hive.exec.pre.hooks :
set hive.exec.pre.hooks=com.example.MyPreHook;

Unfortunately these features aren't really documented, but you can always look into the Driver class to see the evaluation order of the hooks.

Remark: I assumed here Hive 0.11.0, I don't think that the Cloudera distribution differs (too much)

这篇关于Hive执行钩子的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆