如何访问Hive中的现有表? [英] How to access existing table in Hive?
问题描述
我的代码:
我试图通过scala访问spark应用程序中的HIVE。 > val hiveLocation =hdfs:// master:9000 / user / hive / warehouse
val conf = new SparkConf()。setAppName(SOME APP NAME)。setMaster [*])。set(spark.sql.warehouse.dir,hiveLocation)
val sc = new SparkContext(conf)
val spark = SparkSession
.builder ()
.appName(SparkHiveExample)
.master(local [*])
.config(spark.sql.warehouse.dir,hiveLocation)
.config(spark.driver.allowMultipleContexts,true)
.enableHiveSupport()
.getOrCreate()
println(SQL Session开始-------- ------------)
spark.sql(select * from test)。show()
println(SQL会话结束 - -----------------)
但它
未找到表格或视图
但是当我运行 show tables;
un时der hive控制台,我可以看到该表并可以运行 Select * from test
。全部位于用户/配置/仓库位置。只是为了测试,我试着用spark也创建表,以找出表的位置。
.appName(SparkHiveExample)
.master(local [*])
.config(spark.sql.warehouse.dir,hiveLocation)
.config(spark.driver.allowMultipleContexts,true)
.enableHiveSupport()
.getOrCreate()
println(SQL Session ---- ----------------)
spark.sql(CREATE TABLE IF NOT EXISTS test11(name String))
println(SQL会话结束-------------------)
此代码也正确执行(与成功注意事项),但奇怪的是,我可以从蜂房控制台找到此表。
即使我使用从TBLS选择*;在mysql中(在我的设置中,我将mysql配置为存储单元的Metastore ),我没有找到那些由spark创建的表格。
火花位置是否与蜂房控制台不同?
如果我需要访问现有的表格中的火花吗?
来自 spark sql编程指南:
(我突出了相关部分)
配置Hive通过放置您的hive-site.xml ,
core-site.xml(用于安全配置)和在conf /中使用hdfs-site.xml文件(用于
HDFS配置)。
使用Hive时,必须实例化Hive
支持的SparkSession包括连接到持久性Hive Metastore,
支持Hive serdes和Hive用户定义的函数。执行
的用户不具有现有的Hive部署,仍然可以启用Hive支持。
未由hive-site.xml配置时,上下文自动
在当前目录中创建metastore_db,并创建一个由spark.sql.warehouse配置的目录
.dir,默认为目录中的
spark-warehouse Spark应用程序为
的当前目录中启动
<你需要在资源
目录中添加一个 hive-site.xml
配置文件。
这里是Spark与Hive一起工作所需的最小值(将主机设置为配置单元主机):
<?xml version =1.0?>
<?xml-stylesheet type =text / xslhref =configuration.xsl?>
<配置>
<属性>
< name> hive.metastore.uris< / name>
< value> thrift:// host:9083< / value>
< description>元存储主机的IP地址(或完全合格的域名)和端口< / description>
< / property>
< / configuration>
I am trying to access HIVE from spark application with scala.
My code:
val hiveLocation = "hdfs://master:9000/user/hive/warehouse"
val conf = new SparkConf().setAppName("SOME APP NAME").setMaster("local[*]").set("spark.sql.warehouse.dir",hiveLocation)
val sc = new SparkContext(conf)
val spark = SparkSession
.builder()
.appName("SparkHiveExample")
.master("local[*]")
.config("spark.sql.warehouse.dir", hiveLocation)
.config("spark.driver.allowMultipleContexts", "true")
.enableHiveSupport()
.getOrCreate()
println("Start of SQL Session--------------------")
spark.sql("select * from test").show()
println("End of SQL session-------------------")
But it ends up with error message
Table or view not found
but when I run show tables;
under hive console , I can see that table and can run Select * from test
. All are in "user/hive/warehouse" location. Just for testing I tried with create table also from spark, just to find out the table location.
val spark = SparkSession
.builder()
.appName("SparkHiveExample")
.master("local[*]")
.config("spark.sql.warehouse.dir", hiveLocation)
.config("spark.driver.allowMultipleContexts", "true")
.enableHiveSupport()
.getOrCreate()
println("Start of SQL Session--------------------")
spark.sql("CREATE TABLE IF NOT EXISTS test11(name String)")
println("End of SQL session-------------------")
This code also executed properly (with success note) but strange thing is that I can find this table from hive console.
Even if I use select * from TBLS;
in mysql (in my setup I configured mysql as metastore for hive), I did not found those tables which are created from spark.
Is spark location is different than hive console?
What I have to do if I need to access existing table in hive from spark?
from the spark sql programming guide: (I highlighted the relevant parts)
Configuration of Hive is done by placing your hive-site.xml, core-site.xml (for security configuration), and hdfs-site.xml (for HDFS configuration) file in conf/.
When working with Hive, one must instantiate SparkSession with Hive support, including connectivity to a persistent Hive metastore, support for Hive serdes, and Hive user-defined functions. Users who do not have an existing Hive deployment can still enable Hive support. When not configured by the hive-site.xml, the context automatically creates metastore_db in the current directory and creates a directory configured by spark.sql.warehouse.dir, which defaults to the directory spark-warehouse in the current directory that the Spark application is started
you need to add a hive-site.xml
config file to the resource
dir.
here is the minimum needed values for spark to work with hive (set the host to the host of hive):
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
<name>hive.metastore.uris</name>
<value>thrift://host:9083</value>
<description>IP address (or fully-qualified domain name) and port of the metastore host</description>
</property>
</configuration>
这篇关于如何访问Hive中的现有表?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!