使用spark-submit YARN集群模式时缺少配置单元站点 [英] Missing hive-site when using spark-submit YARN cluster mode

查看：376 发布时间：2018/6/12 13:36:09 apache-spark hive hortonworks-data-platform spark-hive

本文介绍了使用spark-submit YARN集群模式时缺少配置单元站点的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

使用HDP 2.5.3，我一直试图调试一些YARN容器类路径问题。

由于HDP同时包含Spark 1.6和2.0.0，因此出现了一些冲突版本

我支持的用户在YARN 客户端模式下成功地使用Spark2与Hive查询，但不能从 cluster 模式下获取有关表的错误未找到，或类似的东西，因为Metastore连接没有建立。

我在猜测设置 - driver-class-path / etc / spark2 / conf：/ etc / hive / conf 或传递 - 文件/etc/spark2/conf/hive-site.xml 在之后> spark-submit 会起作用，但为什么 hive-site.xml 已经从 conf 文件夹中加载？

符合 Hortonworks文档， hive-site 应放置在$ SPARK_HOME / conf

我看到 hdfs-site.xml 和<$ c $例如，c> core-site.xml 和 HADOOP_CONF_DIR 一部分的其他文件，并且这是来自YARN UI容器信息的文件。

  2232355 4 drwx ------ 2 yarn hadoop 4096 Aug 2 21:59 ./__spark_conf__ 
 2232379 4 -rx ------ 1 yarn hadoop 2358 Aug 2 21:59 ./__spark_conf__/topology_script.py 
 2232381 8 -rx ------ 1 yarn hadoop 4676 Aug 2 21:59 ./ __spark_conf __ / yarn-env.sh 
 2232392 4 -rx ------ 1 yarn hadoop 569 Aug 2 21:59 ./__spark_conf__/topology_mappings.data 
 2232398 4 -rx ----- -  1根纱线hadoop 945 8月2日21:59 ./__spark_conf__/taskcontroller.cfg 
 2232356 4 -rx ------ 1纱线hadoop 620 8月2日21:59 ./__spark_conf__/log4j.properties 
 2232382 12 -rx ------ 1 yarn hadoop 8960 Aug 2 21:59 ./__spark_conf__/hdfs-site.xml 
 2232371 4 -rx ------ 1 yarn hadoop 2090 Aug 2 21 ：59 ./__spark_conf__/hadoop-metrics2.properties 
 2232387 4 -rx ------ 1 yarn hadoop 662 Aug 2 21:59 ./__spark_conf__/mapred-env.sh 
 2232390 4  - RX ------ 1 yarn hadoop 1308 Aug 2 21:59 ./__spark_conf__/hadoop-policy.xml 
 2232399 4 -rx ------ 1 yarn hadoop 1480 Aug 2 21:59 ./__spark_conf__ /__spark_conf__.properties 
 2232389 4 -rx ------ 1纱线hadoop 1602 8月2日21:59 ./__spark_conf__/health_check 
 2232385 4 -rx ------ 1纱线hadoop 913 8月2日21:59 ./__spark_conf__/rack_topology.data 
 2232377 4 -rx ------ 1 yarn hadoop 1484 Aug 2 21:59 ./__spark_conf__/ranger-hdfs-audit.xml 
 2232383 4 -rx ------ 1纱线hadoop 1020 8月2日21:59 ./__spark_conf__/commons-logging.properties 
 2232357 8 -rx ------ 1纱线hadoop 5721 8月2日21： 59 ./__spark_conf__/hadoop-env.sh 
 2232391 4 -rx ------ 1 yarn hadoop 281 Aug 2 21:59 ./__spark_conf__/slaves 
 2232373 8 -rx ---- -  1根纱线hadoop 6407 8月2日21:59 ./__spark_conf__/core-site.xml 
 2232393 4 -rx ------ 1纱线hadoop 812 Aug 2 21:59 ./__spark_conf__/rack-topology.sh 
 2232394 4 -rx ------ 1 yarn hadoop 1044 Aug 2 21:59 ./__spark_conf__/ranger-hdfs-security.xml 
 2232395 8 -rx ------ 1 yarn hadoop 4956 Aug 2 21:59 ./__spark_conf__/metrics.properties 
 2232386 8 -rx ------ 1 yarn hadoop 4221 Aug 2 21 ：59 ./__spark_conf__/task-log4j.properties 
 2232380 4 -rx ------ 1 yarn hadoop 64 Aug 2 21:59 ./__spark_conf__/ranger-security.xml 
 2232372 20  - rx ------ 1 yarn hadoop 19975 Aug 2 21:59 ./__spark_conf__/yarn-site.xml 
 2232397 4 -rx ------ 1 yarn hadoop 1006 Aug 2 21:59 ./ __spark_conf __ / ranger-policymgr-ssl.xml 
 2232374 4 -rx ------ 1 yarn hadoop 29 Aug 2 21:59 ./__spark_conf__/yarn.exclude 
 2232384 4 -rx --- --- 1纱线hadoop 1606 8月2日21:59 ./__spark_conf__/container-executor.cfg 
 2232396 4 -rx ------ 1纱线hadoop 1000 8月2 21:59 ./__spark_conf__/ssl-server.xml 
 2232375 4 -rx ------ 1 yarn hadoop 1 Aug 2 21:59 ./__spark_conf__/dfs.exclude 
 2232359 8  - rx ------ 1 yarn hadoop 7660 Aug 2 21:59 ./__spark_conf__/mapred-site.xml 
 2232378 16 -rx ------ 1 yarn hadoop 14474 Aug 2 21:59 ./ __spark_conf __ / capacity-scheduler.xml 
 2232376 4 -rx ------ 1 yarn hadoop 884 Aug 2 21:59 ./__spark_conf__/ssl-client.xml

正如你所看到的， hive-site 不存在，尽管我肯定有 conf / hive-site.xml 用于spark-submit

  [spark @ asthad006 conf] $ pwd&& ls -l 
 /usr/hdp/2.5.3.0-37/spark2/conf 
总计32 
 -rw-r  -  r-- 1 spark spark 742 Mar 6 15:20 hive -site.xml 
 -rw -r  -  r-- 1 spark spark 3月6日15:20 log4j.properties 
 -rw -r  -  r-- 1 spark spark 4956 Mar 6 15： 20 metrics.properties 
 -rw-r  -  r-- 1 spark spark 824 Aug 2 22:24 spark-defaults.conf 
 -rw -r  -  r-- 1 spark spark 1820 Aug 2 22:24 spark-env.sh 
 -rwxr-xr-x 1火花火花244 Mar 6 15:20 spark-thrift-fairscheduler.xml 
 -rw -r  -  r-- 1 hive hadoop 918 Aug 2 22:24 spark-thrift-sparkconf.conf

所以，我不认为我我应该把hive-site放在 HADOOP_CONF_DIR 中，因为 HIVE_CONF_DIR 是分开的，但我的问题是如何获得Spark2在运行时不需要手动将其作为参数传递给 hive-site.xml >

编辑很自然，因为我使用HDP，所以我使用Ambari。以前的集群管理员已在所有机器上安装了Spark2客户机，因此所有可能成为潜在Spark驱动程序的YARN NodeManagers都应具有相同的配置文件 解决方案

您可以使用spark属性 - spark.yarn.dist.files 并指定hive-site.xml的路径。

Using HDP 2.5.3 and I've been trying to debug some YARN container classpath issues.

Since HDP includes both Spark 1.6 and 2.0.0, there have been some conflicting versions

Users I support are successfully able to use Spark2 with Hive queries in YARN client mode, but not from cluster mode they get errors about tables not found, or something like that because the Metastore connection isn't established.

I am guessing that setting either --driver-class-path /etc/spark2/conf:/etc/hive/conf or passing --files /etc/spark2/conf/hive-site.xml after spark-submit would work, but why isn't hive-site.xml loaded already from the conf folder?

Accoringing to Hortonworks docs, says hive-site should be placed in $SPARK_HOME/conf, and it is...

I see hdfs-site.xml and core-site.xml, and other files that are part of HADOOP_CONF_DIR, for example, and this is the from the YARN UI container info.
2232355 4 drwx------ 2 yarn hadoop 4096 Aug 2 21:59 ./__spark_conf__ 2232379 4 -r-x------ 1 yarn hadoop 2358 Aug 2 21:59 ./__spark_conf__/topology_script.py 2232381 8 -r-x------ 1 yarn hadoop 4676 Aug 2 21:59 ./__spark_conf__/yarn-env.sh 2232392 4 -r-x------ 1 yarn hadoop 569 Aug 2 21:59 ./__spark_conf__/topology_mappings.data 2232398 4 -r-x------ 1 yarn hadoop 945 Aug 2 21:59 ./__spark_conf__/taskcontroller.cfg 2232356 4 -r-x------ 1 yarn hadoop 620 Aug 2 21:59 ./__spark_conf__/log4j.properties 2232382 12 -r-x------ 1 yarn hadoop 8960 Aug 2 21:59 ./__spark_conf__/hdfs-site.xml 2232371 4 -r-x------ 1 yarn hadoop 2090 Aug 2 21:59 ./__spark_conf__/hadoop-metrics2.properties 2232387 4 -r-x------ 1 yarn hadoop 662 Aug 2 21:59 ./__spark_conf__/mapred-env.sh 2232390 4 -r-x------ 1 yarn hadoop 1308 Aug 2 21:59 ./__spark_conf__/hadoop-policy.xml 2232399 4 -r-x------ 1 yarn hadoop 1480 Aug 2 21:59 ./__spark_conf__/__spark_conf__.properties 2232389 4 -r-x------ 1 yarn hadoop 1602 Aug 2 21:59 ./__spark_conf__/health_check 2232385 4 -r-x------ 1 yarn hadoop 913 Aug 2 21:59 ./__spark_conf__/rack_topology.data 2232377 4 -r-x------ 1 yarn hadoop 1484 Aug 2 21:59 ./__spark_conf__/ranger-hdfs-audit.xml 2232383 4 -r-x------ 1 yarn hadoop 1020 Aug 2 21:59 ./__spark_conf__/commons-logging.properties 2232357 8 -r-x------ 1 yarn hadoop 5721 Aug 2 21:59 ./__spark_conf__/hadoop-env.sh 2232391 4 -r-x------ 1 yarn hadoop 281 Aug 2 21:59 ./__spark_conf__/slaves 2232373 8 -r-x------ 1 yarn hadoop 6407 Aug 2 21:59 ./__spark_conf__/core-site.xml 2232393 4 -r-x------ 1 yarn hadoop 812 Aug 2 21:59 ./__spark_conf__/rack-topology.sh 2232394 4 -r-x------ 1 yarn hadoop 1044 Aug 2 21:59 ./__spark_conf__/ranger-hdfs-security.xml 2232395 8 -r-x------ 1 yarn hadoop 4956 Aug 2 21:59 ./__spark_conf__/metrics.properties 2232386 8 -r-x------ 1 yarn hadoop 4221 Aug 2 21:59 ./__spark_conf__/task-log4j.properties 2232380 4 -r-x------ 1 yarn hadoop 64 Aug 2 21:59 ./__spark_conf__/ranger-security.xml 2232372 20 -r-x------ 1 yarn hadoop 19975 Aug 2 21:59 ./__spark_conf__/yarn-site.xml 2232397 4 -r-x------ 1 yarn hadoop 1006 Aug 2 21:59 ./__spark_conf__/ranger-policymgr-ssl.xml 2232374 4 -r-x------ 1 yarn hadoop 29 Aug 2 21:59 ./__spark_conf__/yarn.exclude 2232384 4 -r-x------ 1 yarn hadoop 1606 Aug 2 21:59 ./__spark_conf__/container-executor.cfg 2232396 4 -r-x------ 1 yarn hadoop 1000 Aug 2 21:59 ./__spark_conf__/ssl-server.xml 2232375 4 -r-x------ 1 yarn hadoop 1 Aug 2 21:59 ./__spark_conf__/dfs.exclude 2232359 8 -r-x------ 1 yarn hadoop 7660 Aug 2 21:59 ./__spark_conf__/mapred-site.xml 2232378 16 -r-x------ 1 yarn hadoop 14474 Aug 2 21:59 ./__spark_conf__/capacity-scheduler.xml 2232376 4 -r-x------ 1 yarn hadoop 884 Aug 2 21:59 ./__spark_conf__/ssl-client.xml
As you might see, hive-site is not there, even though I definitely have conf/hive-site.xml for spark-submit to take
[spark@asthad006 conf]$ pwd && ls -l /usr/hdp/2.5.3.0-37/spark2/conf total 32 -rw-r--r-- 1 spark spark 742 Mar 6 15:20 hive-site.xml -rw-r--r-- 1 spark spark 620 Mar 6 15:20 log4j.properties -rw-r--r-- 1 spark spark 4956 Mar 6 15:20 metrics.properties -rw-r--r-- 1 spark spark 824 Aug 2 22:24 spark-defaults.conf -rw-r--r-- 1 spark spark 1820 Aug 2 22:24 spark-env.sh -rwxr-xr-x 1 spark spark 244 Mar 6 15:20 spark-thrift-fairscheduler.xml -rw-r--r-- 1 hive hadoop 918 Aug 2 22:24 spark-thrift-sparkconf.conf
So, I don't think I am supposed to place hive-site in HADOOP_CONF_DIR as HIVE_CONF_DIR is separated, but my question is that how do we get Spark2 to pick up the hive-site.xml without needing to manually pass it as a parameter at runtime?

EDIT Naturally, since I'm on HDP I am using Ambari. The previous cluster admin has installed Spark2 clients on all of the machines, so all of the YARN NodeManagers that could be potential Spark drivers should have the same config files
解决方案
You can use spark property - spark.yarn.dist.files and specify path to hive-site.xml there.

这篇关于使用spark-submit YARN集群模式时缺少配置单元站点的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

使用spark-submit YARN集群模式时缺少配置单元站点 [英] Missing hive-site when using spark-submit YARN cluster mode

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

使用spark-submit YARN集群模式时缺少配置单元站点 [英] Missing hive-site when using spark-submit YARN cluster mode

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭