Hive 执行钩子 [英] Hive execution hook

查看:53
本文介绍了Hive 执行钩子的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要在 Apache Hive 中挂钩一个自定义执行挂钩.如果有人知道怎么做,请告诉我.

I am in need to hook a custom execution hook in Apache Hive. Please let me know if somebody know how to do it.

我当前使用的环境如下:

The current environment I am using is given below:

Hadoop:Cloudera 版本 4.1.2操作系统:Centos

Hadoop : Cloudera version 4.1.2 Operating system : Centos

谢谢,阿伦

推荐答案

根据您要在哪个阶段注入自定义代码,有多种类型的钩子:

There are several types of hooks depending on at which stage you want to inject your custom code:

  • 驱动程序运行挂钩(前/后)
  • 语义分析器挂钩(前/后)
  • 执行挂钩(前/失败/后)
  • 客户统计信息发布者

如果您运行脚本,处理流程如下所示:

If you run a script the processing flow looks like as follows:

  1. Driver.run() 接受命令
  2. HiveDriverRunHook.preDriverRun()
    (HiveConf.ConfVars.HIVE_DRIVER_RUN_HOOKS)
  3. Driver.compile() 开始处理命令:创建抽象语法树
  4. AbstractSemanticAnalyzerHook.preAnalyze()
    (HiveConf.ConfVars.SEMANTIC_ANALYZER_HOOK)
  5. 语义分析
  6. AbstractSemanticAnalyzerHook.postAnalyze()
    (HiveConf.ConfVars.SEMANTIC_ANALYZER_HOOK)
  7. 创建并验证查询计划(物理计划)
  8. Driver.execute() :准备运行作业
  9. ExecuteWithHookContext.run()
    (HiveConf.ConfVars.PREEXECHOOKS)
  10. ExecDriver.execute() 运行所有作业
  11. 对于每个 HiveConf.ConfVars.HIVECOUNTERSPULLINTERVAL 间隔的每个作业:
    调用ClientStatsPublisher.run()发布统计信息
    (HiveConf.ConfVars.CLIENTSTATSPUBLISHERS)
    如果任务失败:ExecuteWithHookContext.run()
    (HiveConf.ConfVars.ONFAILUREHOOKS)
  12. 完成所有任务
  13. ExecuteWithHookContext.run()
    (HiveConf.ConfVars.POSTEXECHOOKS)
  14. 返回结果之前 HiveDriverRunHook.postDriverRun()
    ( HiveConf.ConfVars.HIVE_DRIVER_RUN_HOOKS)
  15. 返回结果.
  1. Driver.run() takes the command
  2. HiveDriverRunHook.preDriverRun()
    (HiveConf.ConfVars.HIVE_DRIVER_RUN_HOOKS)
  3. Driver.compile() starts processing the command: creates the abstract syntax tree
  4. AbstractSemanticAnalyzerHook.preAnalyze()
    (HiveConf.ConfVars.SEMANTIC_ANALYZER_HOOK)
  5. Semantic analysis
  6. AbstractSemanticAnalyzerHook.postAnalyze()
    (HiveConf.ConfVars.SEMANTIC_ANALYZER_HOOK)
  7. Create and validate the query plan (physical plan)
  8. Driver.execute() : ready to run the jobs
  9. ExecuteWithHookContext.run()
    (HiveConf.ConfVars.PREEXECHOOKS)
  10. ExecDriver.execute() runs all the jobs
  11. For each job at every HiveConf.ConfVars.HIVECOUNTERSPULLINTERVAL interval:
    ClientStatsPublisher.run() is called to publish statistics
    (HiveConf.ConfVars.CLIENTSTATSPUBLISHERS)
    If a task fails: ExecuteWithHookContext.run()
    (HiveConf.ConfVars.ONFAILUREHOOKS)
  12. Finish all the tasks
  13. ExecuteWithHookContext.run()
    (HiveConf.ConfVars.POSTEXECHOOKS)
  14. Before returning the result HiveDriverRunHook.postDriverRun()
    ( HiveConf.ConfVars.HIVE_DRIVER_RUN_HOOKS)
  15. Return the result.

对于每个钩子,我都指出了您必须实现的接口.在括号中有相应的conf.支柱.您必须设置的密钥才能注册脚本开头的类.例如:设置 PreExecution 挂钩(工作流的第 9 阶段)

For each of the hooks I indicated the interfaces you have to implement. In the brackets there's the corresponding conf. prop. key you have to set in order to register the class at the beginning of the script. E.g: setting the PreExecution hook (9th stage of the workflow)

HiveConf.ConfVars.PREEXECHOOKS -> hive.exec.pre.hooks :
set hive.exec.pre.hooks=com.example.MyPreHook;

不幸的是,这些功能并未真正记录在案,但您可以随时查看 Driver 类,查看钩子的求值顺序.

Unfortunately these features aren't really documented, but you can always look into the Driver class to see the evaluation order of the hooks.

备注:我假设这里是 Hive 0.11.0,我不认为 Cloudera 发行版不同(太多)

Remark: I assumed here Hive 0.11.0, I don't think that the Cloudera distribution differs (too much)

这篇关于Hive 执行钩子的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆