气流HiveCliHook是否连接到远程配置单元群集? [英] Airflow HiveCliHook connection to remote hive cluster?
问题描述
我正在尝试从Airflow的本地副本连接到我的Hive服务器,但似乎HiveCliHook试图在连接到我的Hive的本地副本。
我要遵循以下条件对其进行测试:
从airflow.models import Connection $ b $导入气流
b从airflow.hooks.hive_hooks导入HiveCliHook
usr ='myusername'
pss ='mypass'
session = airflow.settings.Session()
hive_cli = session.query(Connection).filter(Connection.conn_id =='hive_cli_default')。all()[0]
hive_cli.host ='hive_server.test.mydomain.com'
hive_cli.port ='9083'
hive_cli.login = usr
hive_cli.password = pss
hive_cli.schema ='默认'
会话。 commit()
hive = HiveCliHook()
hive.run_cli( select 1)
哪个抛出此错误:
[2018-11-28 13: 23:22,667] {base_hook.py:83}信息-使用与hive_server.test.m的连接ydomain.com
[2018-11-28 13:24:50,891] {hive_hooks.py:220}信息-hive -f / tmp / airflow_hiveop_2Fdl2I / tmpBFoGp7
[2018-11-28 13:24 :55,548] {hive_hooks.py:235}信息-使用jar:file:/usr/local/apache-hive-2.3.4-bin/lib/hive-common-2.3.4.jar!/ hive中的配置初始化日志记录-log4j2.properties异步:true
[2018-11-28 13:25:01,776] {hive_hooks.py:235}信息-失败:SemanticException org.apache.hadoop.hive.ql.metadata.HiveException:java .lang.RuntimeException:无法实例化org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient
有人知道我要去哪里哪里吗?
-
虽然可以使用
HiveCliOperator
(未更改)来连接和执行 remote中的HQL
语句Hive服务器
,唯一的要求是运行Airflow
的框工人
必须o包含已安装的Hive
二进制文件 -
之所以这样,是因为 hive-cli命令由
HiveCliHook
将通过旧的bash
在工作机中运行。 在此阶段,如果未在运行此代码的计算机(即您的Airflow工作人员)中安装Hive CLI
,它将如您的情况一样损坏
直接解决方法是实施您自己的 RemoteHiveCliOperator
<实际上,这似乎是普遍存在的缺点,几乎所有气流
Operator
s默认情况下,他们期望在每个工作程序中都安装必需的软件包。 文档对此进行警告
例如,如果您使用HiveOperator,则需要在该框中安装
的蜂巢CLI
I am trying to connect to my hive server from a local copy of Airflow, but it seems like the HiveCliHook is trying to connect to my local copy of Hive.
I'm running to following to test it:
import airflow
from airflow.models import Connection
from airflow.hooks.hive_hooks import HiveCliHook
usr = 'myusername'
pss = 'mypass'
session = airflow.settings.Session()
hive_cli = session.query(Connection).filter(Connection.conn_id == 'hive_cli_default').all()[0]
hive_cli.host = 'hive_server.test.mydomain.com'
hive_cli.port = '9083'
hive_cli.login = usr
hive_cli.password = pss
hive_cli.schema = 'default'
session.commit()
hive = HiveCliHook()
hive.run_cli("select 1")
Which is throwing this error:
[2018-11-28 13:23:22,667] {base_hook.py:83} INFO - Using connection to: hive_server.test.mydomain.com
[2018-11-28 13:24:50,891] {hive_hooks.py:220} INFO - hive -f /tmp/airflow_hiveop_2Fdl2I/tmpBFoGp7
[2018-11-28 13:24:55,548] {hive_hooks.py:235} INFO - Logging initialized using configuration in jar:file:/usr/local/apache-hive-2.3.4-bin/lib/hive-common-2.3.4.jar!/hive-log4j2.properties Async: true
[2018-11-28 13:25:01,776] {hive_hooks.py:235} INFO - FAILED: SemanticException org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient
Does anyone have any idea where I'm going wrong?
While you can use the
HiveCliOperator
(unaltered) for connecting and executingHQL
statements in remoteHive-Server
, the only requirement is that the box that is running yourAirflow
worker
must also containHive
binaries installedThis is so because the hive-cli command prepared by
HiveCliHook
would be run in worker machine via good-oldbash
. At this stage, ifHive CLI
is not installed in the machine where this code is running (i.e. your Airflow worker), it will break as in your case
Straight-forward workaround is to implement your own RemoteHiveCliOperator
that
- Creates an
SSHHook
to the remote Hive-server machine - And execute your HQL statement via SSHHook like this
In fact this seems to be a universal drawback with almost all Airflow Operator
s that by default they expect requisite packages installed in every worker. The docs warn about it
For example, if you use the HiveOperator, the hive CLI needs to be installed on that box
这篇关于气流HiveCliHook是否连接到远程配置单元群集?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!