以Python编程启动HiveThriftServer [英] Start HiveThriftServer programmatically in Python

查看:184
本文介绍了以Python编程启动HiveThriftServer的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在spark-shell(scala)中,我们导入了
org.apache.spark.sql.hive.thriftserver._
,以便以编程方式启动Hive Thrift服务器,作为特定的配置单元上下文
HiveThriftServer2.startWithContext(hiveContext)为该特定会话公开一个已注册的临时表。



我们如何使用python做同样的事情? python中是否有用于导入HiveThriftServer的包/ api?我们已经使用pyspark创建了一个数据框



谢谢



Ravi Narayanan

解决方案

您可以使用py4j java网关导入它。以下代码适用于spark 2.0.2,并且可以通过直线查询在python脚本中注册的临时表。

  from py4j.java_gateway import java_import 
java_import(sc._gateway.jvm,)

spark = SparkSession \
.builder \
.appName(app_name)\
.master(master)\
.enableHiveSupport()\
.config('spark.sql.hive.thriftServer.singleSession',True)\
.getOrCreate()
sc = spark.sparkContext
sc.setLogLevel('INFO')

#使用jvm启动Thrift服务器并在jvm中传递与pyspark会话相对应的相同的spark会话侧。
sc._gateway.jvm.org.apache.spark.sql.hive.thriftserver.HiveThriftServer2.startWithContext(spark._jwrapped)

spark.sql( 'CREATE TABLE mytable的')
DATA_FILE = 路径与数据csv文件
数据帧= spark.read.option( 头, 真)。CSV(DATA_FILE).cache()
dataframe.createOrReplaceTempView( myTempView)

然后去直线检查它是否正确开始:

 在终端> $ SPARK_HOME / bin / beeline 
直线> !connect jdbc:hive2:// localhost:10000
beeline>展示桌子;

它应该显示在python中创建的表和临时表/视图,包括myTable和myTempView以上。为了看到临时视图,必须具有相同的spark会话。



(参见答案:避免以创建的上下文以编程方式启动HiveThriftServer2
注意:即使Thrift服务器是从终端开始并连接到相同的Metastore,但无法访问临时视图,因为它们在Spark会话中并未写入Metastore)


In the spark-shell (scala), we import, org.apache.spark.sql.hive.thriftserver._ for starting Hive Thrift server programatically for a particular hive context as HiveThriftServer2.startWithContext(hiveContext) to expose a registered temp table for that particular session.

How can we do the same using python? Is there a package / api on python for importing HiveThriftServer? Any other thoughts / recommendations appreciated.

We have used pyspark for creating a dataframe

Thanks

Ravi Narayanan

解决方案

You can import it using py4j java gateway. The following code worked for spark 2.0.2 and could query temp tables registered in python script through beeline.

from py4j.java_gateway import java_import
java_import(sc._gateway.jvm,"")

spark = SparkSession \
        .builder \
        .appName(app_name) \
        .master(master)\
        .enableHiveSupport()\
        .config('spark.sql.hive.thriftServer.singleSession', True)\
        .getOrCreate()
sc=spark.sparkContext
sc.setLogLevel('INFO')

#Start the Thrift Server using the jvm and passing the same spark session corresponding to pyspark session in the jvm side.
sc._gateway.jvm.org.apache.spark.sql.hive.thriftserver.HiveThriftServer2.startWithContext(spark._jwrapped)

spark.sql('CREATE TABLE myTable')
data_file="path to csv file with data"
dataframe = spark.read.option("header","true").csv(data_file).cache()
dataframe.createOrReplaceTempView("myTempView")

Then go to beeline to check if it correclty started:

in terminal> $SPARK_HOME/bin/beeline
beeline> !connect jdbc:hive2://localhost:10000
beeline> show tables;

It should show the tables and temp tables/views created in python including "myTable" and "myTempView" above. It is necessary to have the same spark session in order to see temporary views

(see ans: Avoid starting HiveThriftServer2 with created context programmatically.
NOTE: It's possible to access hive tables even if the Thrift server is started from terminal and connected to the same metastore, however temp views cannot be accessed as they are in the spark session and not written to metastore)

这篇关于以Python编程启动HiveThriftServer的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆