Airflow HiveCliHook 连接到远程配置单元集群? [英] Airflow HiveCliHook connection to remote hive cluster?

查看:119
本文介绍了Airflow HiveCliHook 连接到远程配置单元集群?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试从 Airflow 的本地副本连接到我的 Hive 服务器,但 HiveCliHook 似乎正在尝试连接到我的 Hive 本地副本.

I am trying to connect to my hive server from a local copy of Airflow, but it seems like the HiveCliHook is trying to connect to my local copy of Hive.

我正在运行以进行测试:

I'm running to following to test it:

import airflow
from airflow.models import Connection
from airflow.hooks.hive_hooks import  HiveCliHook

usr = 'myusername'
pss = 'mypass'

session = airflow.settings.Session()
hive_cli = session.query(Connection).filter(Connection.conn_id == 'hive_cli_default').all()[0]

hive_cli.host = 'hive_server.test.mydomain.com'
hive_cli.port = '9083'
hive_cli.login = usr
hive_cli.password = pss
hive_cli.schema = 'default'

session.commit()

hive = HiveCliHook()

hive.run_cli("select 1")

引发此错误的原因:

[2018-11-28 13:23:22,667] {base_hook.py:83} INFO - Using connection to: hive_server.test.mydomain.com
[2018-11-28 13:24:50,891] {hive_hooks.py:220} INFO - hive -f /tmp/airflow_hiveop_2Fdl2I/tmpBFoGp7  
[2018-11-28 13:24:55,548] {hive_hooks.py:235} INFO - Logging initialized using configuration in jar:file:/usr/local/apache-hive-2.3.4-bin/lib/hive-common-2.3.4.jar!/hive-log4j2.properties Async: true  
[2018-11-28 13:25:01,776] {hive_hooks.py:235} INFO - FAILED: SemanticException org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient

有人知道我哪里出错了吗?

Does anyone have any idea where I'm going wrong?

推荐答案

  • 虽然您可以使用HiveCliOperator(未更改)在远程中连接和执行HQL语句> Hive-Server,唯一的要求是运行你的 Airflow workerbox 也必须包含 Hive 已安装二进制文件

    • While you can use the HiveCliOperator (unaltered) for connecting and executing HQL statements in remote Hive-Server, the only requirement is that the box that is running your Airflow worker must also contain Hive binaries installed

      这是因为 HiveCliHook 准备的nofollow noreferrer">hive-cli 命令将通过老式的 bashworker machine 中运行.现阶段,如果 Hive CLI 没有安装在运行此代码的机器(即您的 Airflow 工作器)中,它会像您的情况一样中断

      This is so because the hive-cli command prepared by HiveCliHook would be run in worker machine via good-old bash. At this stage, if Hive CLI is not installed in the machine where this code is running (i.e. your Airflow worker), it will break as in your case

      直接的解决方法是实现您自己的 RemoteHiveCliOperator

      Straight-forward workaround is to implement your own RemoteHiveCliOperator that

      • 创建一个SSHHook 到远程 Hive 服务器机器
      • 并通过 SSHHook 执行您的 HQL 语句,例如 这个

      事实上,这似乎是几乎所有 Airflow Operator 的普遍缺陷,默认情况下,他们希望每个工作人员都安装必要的软件包.docs 对此提出警告

      In fact this seems to be a universal drawback with almost all Airflow Operators that by default they expect requisite packages installed in every worker. The docs warn about it

      例如,如果您使用 HiveOperator,则 hive CLI 需要安装在那个盒子上

      For example, if you use the HiveOperator, the hive CLI needs to be installed on that box

      这篇关于Airflow HiveCliHook 连接到远程配置单元集群?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆