气流HiveCliHook是否连接到远程配置单元群集? [英] Airflow HiveCliHook connection to remote hive cluster?

查看:88
本文介绍了气流HiveCliHook是否连接到远程配置单元群集?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试从Airflow的本地副本连接到我的Hive服务器,但似乎HiveCliHook试图在连接到我的Hive的本地副本。



我要遵循以下条件对其进行测试:

 从airflow.models import Connection $ b $导入气流
b从airflow.hooks.hive_hooks导入HiveCliHook

usr ='myusername'
pss ='mypass'

session = airflow.settings.Session()
hive_cli = session.query(Connection).filter(Connection.conn_id =='hive_cli_default')。all()[0]

hive_cli.host ='hive_server.test.mydomain.com'
hive_cli.port ='9083'
hive_cli.login = usr
hive_cli.password = pss
hive_cli.schema ='默认'

会话。 commit()

hive = HiveCliHook()

hive.run_cli( select 1)

哪个抛出此错误:

  [2018-11-28 13: 23:22,667] {base_hook.py:83}信息-使用与hive_server.test.m的连接ydomain.com 
[2018-11-28 13:24:50,891] {hive_hooks.py:220}信息-hive -f / tmp / airflow_hiveop_2Fdl2I / tmpBFoGp7
[2018-11-28 13:24 :55,548] {hive_hooks.py:235}信息-使用jar:file:/usr/local/apache-hive-2.3.4-bin/lib/hive-common-2.3.4.jar!/ hive中的配置初始化日志记录-log4j2.properties异步:true
[2018-11-28 13:25:01,776] {hive_hooks.py:235}信息-失败:SemanticException org.apache.hadoop.hive.ql.metadata.HiveException:java .lang.RuntimeException:无法实例化org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient

有人知道我要去哪里哪里吗?

解决方案


  • 虽然可以使用 HiveCliOperator 未更改)来连接和执行 remote中的 HQL 语句 Hive服务器,唯一的要求是运行 Airflow 工人必须o包含已安装的 Hive 二进制文件


  • 之所以这样,是因为 hive-cli命令由 HiveCliHook 将通过旧的 bash 工作机中运行。 在此阶段,如果未在运行此代码的计算机(即您的Airflow工作人员)中安装 Hive CLI ,它将如您的情况一样损坏







直接解决方法是实施您自己的 RemoteHiveCliOperator




  • 创建 SSHHook 到远程Hive服务器计算机
  • 并通过SSHHook执行您的HQL语句,例如






<实际上,这似乎是普遍存在的缺点,几乎所有气流 Operator s默认情况下,他们期望在每个工作程序中都安装必需的软件包。 文档对此进行警告


例如,如果您使用HiveOperator,则需要在该框中安装
的蜂巢CLI



I am trying to connect to my hive server from a local copy of Airflow, but it seems like the HiveCliHook is trying to connect to my local copy of Hive.

I'm running to following to test it:

import airflow
from airflow.models import Connection
from airflow.hooks.hive_hooks import  HiveCliHook

usr = 'myusername'
pss = 'mypass'

session = airflow.settings.Session()
hive_cli = session.query(Connection).filter(Connection.conn_id == 'hive_cli_default').all()[0]

hive_cli.host = 'hive_server.test.mydomain.com'
hive_cli.port = '9083'
hive_cli.login = usr
hive_cli.password = pss
hive_cli.schema = 'default'

session.commit()

hive = HiveCliHook()

hive.run_cli("select 1")

Which is throwing this error:

[2018-11-28 13:23:22,667] {base_hook.py:83} INFO - Using connection to: hive_server.test.mydomain.com
[2018-11-28 13:24:50,891] {hive_hooks.py:220} INFO - hive -f /tmp/airflow_hiveop_2Fdl2I/tmpBFoGp7  
[2018-11-28 13:24:55,548] {hive_hooks.py:235} INFO - Logging initialized using configuration in jar:file:/usr/local/apache-hive-2.3.4-bin/lib/hive-common-2.3.4.jar!/hive-log4j2.properties Async: true  
[2018-11-28 13:25:01,776] {hive_hooks.py:235} INFO - FAILED: SemanticException org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient

Does anyone have any idea where I'm going wrong?

解决方案

  • While you can use the HiveCliOperator (unaltered) for connecting and executing HQL statements in remote Hive-Server, the only requirement is that the box that is running your Airflow worker must also contain Hive binaries installed

  • This is so because the hive-cli command prepared by HiveCliHook would be run in worker machine via good-old bash. At this stage, if Hive CLI is not installed in the machine where this code is running (i.e. your Airflow worker), it will break as in your case


Straight-forward workaround is to implement your own RemoteHiveCliOperator that

  • Creates an SSHHook to the remote Hive-server machine
  • And execute your HQL statement via SSHHook like this

In fact this seems to be a universal drawback with almost all Airflow Operators that by default they expect requisite packages installed in every worker. The docs warn about it

For example, if you use the HiveOperator, the hive CLI needs to be installed on that box

这篇关于气流HiveCliHook是否连接到远程配置单元群集?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆