Airflow HiveCliHook 连接到远程配置单元集群? [英] Airflow HiveCliHook connection to remote hive cluster?
问题描述
我正在尝试从 Airflow 的本地副本连接到我的 Hive 服务器,但 HiveCliHook 似乎正在尝试连接到我的 Hive 本地副本.
I am trying to connect to my hive server from a local copy of Airflow, but it seems like the HiveCliHook is trying to connect to my local copy of Hive.
我正在运行以进行测试:
I'm running to following to test it:
import airflow
from airflow.models import Connection
from airflow.hooks.hive_hooks import HiveCliHook
usr = 'myusername'
pss = 'mypass'
session = airflow.settings.Session()
hive_cli = session.query(Connection).filter(Connection.conn_id == 'hive_cli_default').all()[0]
hive_cli.host = 'hive_server.test.mydomain.com'
hive_cli.port = '9083'
hive_cli.login = usr
hive_cli.password = pss
hive_cli.schema = 'default'
session.commit()
hive = HiveCliHook()
hive.run_cli("select 1")
引发此错误的原因:
[2018-11-28 13:23:22,667] {base_hook.py:83} INFO - Using connection to: hive_server.test.mydomain.com
[2018-11-28 13:24:50,891] {hive_hooks.py:220} INFO - hive -f /tmp/airflow_hiveop_2Fdl2I/tmpBFoGp7
[2018-11-28 13:24:55,548] {hive_hooks.py:235} INFO - Logging initialized using configuration in jar:file:/usr/local/apache-hive-2.3.4-bin/lib/hive-common-2.3.4.jar!/hive-log4j2.properties Async: true
[2018-11-28 13:25:01,776] {hive_hooks.py:235} INFO - FAILED: SemanticException org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient
有人知道我哪里出错了吗?
Does anyone have any idea where I'm going wrong?
推荐答案
虽然您可以使用
HiveCliOperator
(未更改)在远程中连接和执行HQL
语句>Hive-Server
,唯一的要求是运行你的Airflow
worker
的 box 也必须包含Hive
已安装二进制文件While you can use the
HiveCliOperator
(unaltered) for connecting and executingHQL
statements in remoteHive-Server
, the only requirement is that the box that is running yourAirflow
worker
must also containHive
binaries installed这是因为 HiveCliHook 准备的nofollow noreferrer">hive-cli 命令将通过老式的
bash
在 worker machine 中运行.现阶段,如果Hive CLI
没有安装在运行此代码的机器(即您的 Airflow 工作器)中,它会像您的情况一样中断This is so because the hive-cli command prepared by
HiveCliHook
would be run in worker machine via good-oldbash
. At this stage, ifHive CLI
is not installed in the machine where this code is running (i.e. your Airflow worker), it will break as in your case直接的解决方法是实现您自己的
RemoteHiveCliOperator
Straight-forward workaround is to implement your own
RemoteHiveCliOperator
that事实上,这似乎是几乎所有 Airflow
Operator
的普遍缺陷,默认情况下,他们希望每个工作人员都安装必要的软件包.docs 对此提出警告In fact this seems to be a universal drawback with almost all Airflow
Operator
s that by default they expect requisite packages installed in every worker. The docs warn about it例如,如果您使用 HiveOperator,则 hive CLI 需要安装在那个盒子上
For example, if you use the HiveOperator, the hive CLI needs to be installed on that box
这篇关于Airflow HiveCliHook 连接到远程配置单元集群?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!