如何访问 Hive 中的现有表? [英] How to access existing table in Hive?
问题描述
我正在尝试使用 Scala 从 Spark 应用程序访问 HIVE.
I am trying to access HIVE from spark application with scala.
我的代码:
val hiveLocation = "hdfs://master:9000/user/hive/warehouse"
val conf = new SparkConf().setAppName("SOME APP NAME").setMaster("local[*]").set("spark.sql.warehouse.dir",hiveLocation)
val sc = new SparkContext(conf)
val spark = SparkSession
.builder()
.appName("SparkHiveExample")
.master("local[*]")
.config("spark.sql.warehouse.dir", hiveLocation)
.config("spark.driver.allowMultipleContexts", "true")
.enableHiveSupport()
.getOrCreate()
println("Start of SQL Session--------------------")
spark.sql("select * from test").show()
println("End of SQL session-------------------")
但它以错误信息结束
未找到表或视图
但是当我在 hive 控制台下运行 show tables;
时,我可以看到该表并且可以运行 Select * from test
.所有都在用户/蜂巢/仓库"位置.只是为了测试,我也尝试从 spark 创建表,只是为了找出表的位置.
but when I run show tables;
under hive console , I can see that table and can run Select * from test
. All are in "user/hive/warehouse" location. Just for testing I tried with create table also from spark, just to find out the table location.
val spark = SparkSession
.builder()
.appName("SparkHiveExample")
.master("local[*]")
.config("spark.sql.warehouse.dir", hiveLocation)
.config("spark.driver.allowMultipleContexts", "true")
.enableHiveSupport()
.getOrCreate()
println("Start of SQL Session--------------------")
spark.sql("CREATE TABLE IF NOT EXISTS test11(name String)")
println("End of SQL session-------------------")
此代码也正确执行(带有成功说明),但奇怪的是我可以从 hive 控制台找到此表.
This code also executed properly (with success note) but strange thing is that I can find this table from hive console.
即使我在 mysql 中使用 select * from TBLS;
(在我的设置中,我将 mysql 配置为 hive 的元存储),我也没有找到那些从 spark 创建的表.
Even if I use select * from TBLS;
in mysql (in my setup I configured mysql as metastore for hive), I did not found those tables which are created from spark.
spark 位置与 hive 控制台不同吗?
Is spark location is different than hive console?
如果我需要从 spark 访问 hive 中的现有表,我必须做什么?
What I have to do if I need to access existing table in hive from spark?
推荐答案
来自 spark sql 编程指南:(我突出显示了相关部分)
from the spark sql programming guide: (I highlighted the relevant parts)
Hive 的配置是通过放置 hive-site.xml 来完成的,core-site.xml(用于安全配置)和 hdfs-site.xml(用于HDFS 配置)文件在 conf/中.
Configuration of Hive is done by placing your hive-site.xml, core-site.xml (for security configuration), and hdfs-site.xml (for HDFS configuration) file in conf/.
使用 Hive 时,必须使用 Hive 实例化 SparkSession支持,包括连接到持久的 Hive 元存储,支持 Hive serdes 和 Hive 用户定义函数.做的用户没有现有的 Hive 部署仍然可以启用 Hive 支持.hive-site.xml 未配置时,上下文自动在当前目录下创建metastore_db并创建一个目录由 spark.sql.warehouse.dir 配置,默认为目录spark-warehouse 在 Spark 应用程序所在的当前目录中开始
When working with Hive, one must instantiate SparkSession with Hive support, including connectivity to a persistent Hive metastore, support for Hive serdes, and Hive user-defined functions. Users who do not have an existing Hive deployment can still enable Hive support. When not configured by the hive-site.xml, the context automatically creates metastore_db in the current directory and creates a directory configured by spark.sql.warehouse.dir, which defaults to the directory spark-warehouse in the current directory that the Spark application is started
您需要将 hive-site.xml
配置文件添加到 resource
目录.这是 spark 与 hive 一起工作所需的最低值(将主机设置为 hive 的主机):
you need to add a hive-site.xml
config file to the resource
dir.
here is the minimum needed values for spark to work with hive (set the host to the host of hive):
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
<name>hive.metastore.uris</name>
<value>thrift://host:9083</value>
<description>IP address (or fully-qualified domain name) and port of the metastore host</description>
</property>
</configuration>
这篇关于如何访问 Hive 中的现有表?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!