如何在运行时在 spark-shell 中添加 hive 属性 [英] howto add hive properties at runtime in spark-shell

查看:48
本文介绍了如何在运行时在 spark-shell 中添加 hive 属性的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如何在运行时设置 hive 属性,例如:hive.metastore.warehouse.dir?或者至少是一种更动态的设置属性的方式,而不是将它放在像 spark_home/conf/hive-site.xml

How do you set a hive property like: hive.metastore.warehouse.dir at runtime? Or at least a more dynamic way of setting a property like the above, than putting it in a file like spark_home/conf/hive-site.xml

推荐答案

我遇到了同样的问题,对我来说,它通过从 Spark (2.4.0) 设置 Hive 属性来解决.请通过 spark-shell、spark-submit 和 SparkConf 在下面找到所有选项.

I faced the same issue and for me it worked by setting Hive properties from Spark (2.4.0). Please find below all the options through spark-shell, spark-submit and SparkConf.

选项 1(火花壳)

spark-shell --conf spark.hadoop.hive.metastore.warehouse.dir=some_pathmetastore_db_2

最初我尝试将 hive.metastore.warehouse.dir 设置为 some_pathmetastore_db_2 的 spark-shell.然后我收到下一个警告:

Initially I tried with spark-shell with hive.metastore.warehouse.dir set to some_pathmetastore_db_2. Then I get the next warning:

警告:忽略非火花配置属性:hive.metastore.warehouse.dir=C:winutilshadoop-2.7.1inmetastore_db_2

Warning: Ignoring non-spark config property: hive.metastore.warehouse.dir=C:winutilshadoop-2.7.1inmetastore_db_2

虽然当我创建一个 Hive 表时:

Although when I create a Hive table with:

bigDf.write.mode("overwrite").saveAsTable("big_table")

Hive 元数据正确存储在 metastore_db_2 文件夹下.

The Hive metadata are stored correctly under metastore_db_2 folder.

当我使用 spark.hadoop.hive.metastore.warehouse.dir 时,警告消失,结果仍然保存在 metastore_db_2 目录中.

When I use spark.hadoop.hive.metastore.warehouse.dir the warning disappears and the results are still saved in the metastore_db_2 directory.

选项 2(触发提交)

为了在使用 spark-submit 提交作业时使用 hive.metastore.warehouse.dir,我遵循了以下步骤.

In order to use hive.metastore.warehouse.dir when submitting a job with spark-submit I followed the next steps.

首先我写了一些代码来用 Hive 保存一些随机数据:

First I wrote some code to save some random data with Hive:

import org.apache.spark.SparkConf
import org.apache.spark.sql.SparkSession

val sparkConf = new SparkConf().setAppName("metastore_test").setMaster("local")
val spark = SparkSession.builder().config(sparkConf).getOrCreate()

import spark.implicits._
var dfA = spark.createDataset(Seq(
      (1, "val1", "p1"),
      (2, "val1", "p2"),
      (3, "val2", "p3"),
      (3, "val3", "p4"))).toDF("id", "value", "p")

dfA.write.mode("overwrite").saveAsTable("metastore_test")

spark.sql("select * from metastore_test").show(false)

接下来我提交了工作:

spark-submit --class org.tests.Main 
        --conf spark.hadoop.hive.metastore.warehouse.dir=C:winutilshadoop-2.7.1inmetastore_db_2 
        spark-scala-test_2.11-0.1.jar 

C:winutilshadoop-2.7.1inmetastore_db_2 文件夹下正确创建了 metastore_test 表.

The metastore_test table was properly created under the C:winutilshadoop-2.7.1inmetastore_db_2 folder.

选项 3 (SparkConf)

通过 Spark 代码中的 SparkSession.

Via SparkSession in the Spark code.

val sparkConf = new SparkConf()
      .setAppName("metastore_test")
      .set("spark.hadoop.hive.metastore.warehouse.dir", "C:\winutils\hadoop-2.7.1\bin\metastore_db_2")
      .setMaster("local")

这次尝试也成功了.

仍然存在的问题是为什么我必须使用 spark.hadoop 扩展属性才能按预期工作?

The question which still remains is why I have to extend the property with spark.hadoop in order to work as expected?

这篇关于如何在运行时在 spark-shell 中添加 hive 属性的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆