如何在Spark Shell中在运行时添加配置单元属性 [英] howto add hive properties at runtime in spark-shell
问题描述
如何在运行时设置蜂巢属性,例如:hive.metastore.warehouse.dir
?或至少像上面那样设置属性的动态方式,而不是将其放入类似spark_home/conf/hive-site.xml
How do you set a hive property like: hive.metastore.warehouse.dir
at runtime? Or at least a more dynamic way of setting a property like the above, than putting it in a file like spark_home/conf/hive-site.xml
推荐答案
我遇到了同样的问题,对我来说,它可以通过设置Spark(2.4.0)中的Hive属性来工作.请在下面通过spark-shell,spark-submit和SparkConf找到所有选项.
I faced the same issue and for me it worked by setting Hive properties from Spark (2.4.0). Please find below all the options through spark-shell, spark-submit and SparkConf.
选项1(火花壳)
spark-shell --conf spark.hadoop.hive.metastore.warehouse.dir=some_path\metastore_db_2
最初,我尝试将hive.metastore.warehouse.dir
设置为some_path\metastore_db_2
的spark-shell.然后我得到下一个警告:
Initially I tried with spark-shell with hive.metastore.warehouse.dir
set to some_path\metastore_db_2
. Then I get the next warning:
警告:忽略非火花配置属性: hive.metastore.warehouse.dir = C:\ winutils \ hadoop-2.7.1 \ bin \ metastore_db_2
Warning: Ignoring non-spark config property: hive.metastore.warehouse.dir=C:\winutils\hadoop-2.7.1\bin\metastore_db_2
尽管我用以下方法创建Hive表:
Although when I create a Hive table with:
bigDf.write.mode("overwrite").saveAsTable("big_table")
Hive元数据已正确存储在metastore_db_2文件夹下.
The Hive metadata are stored correctly under metastore_db_2 folder.
当我使用spark.hadoop.hive.metastore.warehouse.dir
时,警告消失,结果仍保存在metastore_db_2目录中.
When I use spark.hadoop.hive.metastore.warehouse.dir
the warning disappears and the results are still saved in the metastore_db_2 directory.
选项2(火花提交)
为了在通过火花提交提交作业时使用hive.metastore.warehouse.dir
,我遵循了后续步骤.
In order to use hive.metastore.warehouse.dir
when submitting a job with spark-submit I followed the next steps.
首先,我写了一些代码来用Hive保存一些随机数据:
First I wrote some code to save some random data with Hive:
import org.apache.spark.SparkConf
import org.apache.spark.sql.SparkSession
val sparkConf = new SparkConf().setAppName("metastore_test").setMaster("local")
val spark = SparkSession.builder().config(sparkConf).getOrCreate()
import spark.implicits._
var dfA = spark.createDataset(Seq(
(1, "val1", "p1"),
(2, "val1", "p2"),
(3, "val2", "p3"),
(3, "val3", "p4"))).toDF("id", "value", "p")
dfA.write.mode("overwrite").saveAsTable("metastore_test")
spark.sql("select * from metastore_test").show(false)
接下来,我将工作提交给:
Next I submitted the job with:
spark-submit --class org.tests.Main \
--conf spark.hadoop.hive.metastore.warehouse.dir=C:\winutils\hadoop-2.7.1\bin\metastore_db_2
spark-scala-test_2.11-0.1.jar
已在C:\winutils\hadoop-2.7.1\bin\metastore_db_2
文件夹下正确创建了metastore_test表.
The metastore_test table was properly created under the C:\winutils\hadoop-2.7.1\bin\metastore_db_2
folder.
选项3(SparkConf)
通过Spark代码中的SparkSession.
Via SparkSession in the Spark code.
val sparkConf = new SparkConf()
.setAppName("metastore_test")
.set("spark.hadoop.hive.metastore.warehouse.dir", "C:\\winutils\\hadoop-2.7.1\\bin\\metastore_db_2")
.setMaster("local")
此尝试也成功.
仍然存在的问题是,为什么我必须使用spark.hadoop
扩展该属性才能正常工作?
The question which still remains is why I have to extend the property with spark.hadoop
in order to work as expected?
这篇关于如何在Spark Shell中在运行时添加配置单元属性的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!