阿帕奇星火的部署问题(集群模式)蜂巢 [英] Apache Spark's deployment issue (cluster-mode) with Hive
问题描述
修改
我正在开发一个应用程序的Spark从多个结构化模式读取数据,我想聚集来自这些模式的信息。我的应用程序运行良好,当我在本地运行。但是,当我在集群上运行它时,我遇到了麻烦配置(最有可能与蜂房的site.xml)或提交 - 命令参数。我看过的其他相关职位,但未能找到解决具体到我的方案。我已经提到过我试过,什么错误我详细下文了什么命令。我是新来的火花,我可能会丢失一些小事,但能提供更多的信息来支持我的问题。
I'm developing a Spark application that reads a data from the multiple structured schemas and I'm trying to aggregate the information from those schemas. My application runs well when I run it locally. But when I run it on a cluster, I'm having trouble with the configurations (most probably with hive-site.xml) or with the submit-command arguments. I've looked for the other related posts, but couldn't find the solution SPECIFIC to my scenario. I've mentioned what commands I tried and what errors I got in detail below. I'm new to Spark and I might be missing something trivial, but can provide more information to support my question.
原题:
我一直运行在HDP2.3组件捆绑6个节点的Hadoop集群火花我的应用程序。
I've been trying to run my spark application in a 6-node Hadoop cluster bundled with HDP2.3 components.
下面是可能在暗示解决方案是为你们有用成分的信息:
Here are component information that might be useful for you guys in suggesting the solutions:
集群信息:6个节点的集群:
128GB RAM
24芯
8TB硬盘
128GB RAM 24 core 8TB HDD
应用程序中使用组件
HDP - 2.3
星火 - 1.3.1
Spark - 1.3.1
$的Hadoop版本:
Hadoop 2.7.1.2.3.0.0-2557
Subversion git@github.com:hortonworks/hadoop.git -r 9f17d40a0f2046d217b2bff90ad6e2fc7e41f5e1
Compiled by jenkins on 2015-07-14T13:08Z
Compiled with protoc 2.5.0
From source with checksum 54f9bbb4492f92975e84e390599b881d
情景:
我试图用SparkContext和HiveContext的方式采取类似数据帧它的数据结构,充分利用了火花的实时查询。在我的应用程序中使用的依赖关系是:
I'm trying to use the SparkContext and HiveContext in a way to take full advantage of the spark's real time query on it's data structure like dataframe. The dependencies used in my application are:
<dependency> <!-- Spark dependency -->
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.10</artifactId>
<version>1.3.1</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-sql_2.10</artifactId>
<version>1.3.1</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-hive_2.10</artifactId>
<version>1.3.1</version>
</dependency>
<dependency>
<groupId>com.databricks</groupId>
<artifactId>spark-csv_2.10</artifactId>
<version>1.4.0</version>
</dependency>
下面是提交命令和我得到的coresponding错误日志:
Below are the submit commands and the coresponding error logs that I'm getting:
提交COMMAND1:
Submit Command1:
spark-submit --class working.path.to.Main \
--master yarn \
--deploy-mode cluster \
--num-executors 17 \
--executor-cores 8 \
--executor-memory 25g \
--driver-memory 25g \
--num-executors 5 \
application-with-all-dependencies.jar
错误LOG1:
User class threw exception: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.metastore.HiveMetaStoreClient
提交命令2:
spark-submit --class working.path.to.Main \
--master yarn \
--deploy-mode cluster \
--num-executors 17 \
--executor-cores 8 \
--executor-memory 25g \
--driver-memory 25g \
--num-executors 5 \
--files /etc/hive/conf/hive-site.xml \
application-with-all-dependencies.jar
错误LOG2:
User class threw exception: java.lang.NumberFormatException: For input string: "5s"
由于我没有管理权限,我不能修改配置。好吧,我可以联系到IT工程师并进行更改,但我要找的
的解决方案,包括在配置文件中,以下的变化,如果可能的!
Since I don't have the administrative permissions, I cannot modify the configuration. Well, I can contact to the IT engineer and make the changes, but I'm looking for the solution that involves less changes in the configuration files, if possible!
配置更改建议<一href=\"https://hadoopist.word$p$pss.com/2016/02/23/how-to-resolve-error-yarn-applicationmaster-user-class-threw-exception-java-lang-runtimeexception-java-lang-numberformatexception-for-input-string-5s-in-spark-submit/\"相对=nofollow>这里。
然后我尝试过各种各样的jar文件作为参数在其他论坛建议。
Then I tried passing various jar files as arguments as suggested in other discussion forums.
提交指令代码:
spark-submit --class working.path.to.Main \
--master yarn \
--deploy-mode cluster \
--num-executors 17 \
--executor-cores 8 \
--executor-memory 25g \
--driver-memory 25g \
--num-executors 5 \
--jars /usr/hdp/2.3.0.0-2557/spark/lib/datanucleus-api-jdo-3.2.6.jar,/usr/hdp/2.3.0.0-2557/spark/lib/datanucleus-core-3.2.10.jar,/usr/hdp/2.3.0.0-2557/spark/lib/datanucleus-rdbms-3.2.9.jar \
--files /etc/hive/conf/hive-site.xml \
application-with-all-dependencies.jar
错误LOG3:
User class threw exception: java.lang.NumberFormatException: For input string: "5s"
我不知道用下面的命令发生了什么事,不能分析错误日志。
I didn't understood what happened with the following command and couldn't analyze the error log.
提交Command4:
Submit Command4:
spark-submit --class working.path.to.Main \
--master yarn \
--deploy-mode cluster \
--num-executors 17 \
--executor-cores 8 \
--executor-memory 25g \
--driver-memory 25g \
--num-executors 5 \
--jars /usr/hdp/2.3.0.0-2557/spark/lib/*.jar \
--files /etc/hive/conf/hive-site.xml \
application-with-all-dependencies.jar
提交LOG4:
Application application_1461686223085_0014 failed 2 times due to AM Container for appattempt_1461686223085_0014_000002 exited with exitCode: 10
For more detailed output, check application tracking page:http://cluster-host:XXXX/cluster/app/application_1461686223085_0014Then, click on links to logs of each attempt.
Diagnostics: Exception from container-launch.
Container id: container_e10_1461686223085_0014_02_000001
Exit code: 10
Stack trace: ExitCodeException exitCode=10:
at org.apache.hadoop.util.Shell.runCommand(Shell.java:545)
at org.apache.hadoop.util.Shell.run(Shell.java:456)
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:722)
at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:211)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Container exited with a non-zero exit code 10
Failing this attempt. Failing the application.
任何其他可能的选择吗?任何形式的帮助将是非常美联社preciated。请让我知道如果你需要任何其他信息。
Any other possible options? Any kind of help will be highly appreciated. Please let me know if you need any other information.
感谢您。
推荐答案
该解决方案在<一个解释href=\"https://community.hortonworks.com/questions/5798/spark-hive-tables-not-found-when-running-in-yarn-c.html\"相对=nofollow>这里工作了我的情况。有两个位置蜂房的site.xml所在,可能会造成混乱。使用 - 文件/usr/hdp/current/spark-client/conf/hive-site.xml
而不是 - 文件/ etc /蜂巢/conf/hive-site.xml
。我没有要补充的罐子我的配置。希望这能帮助别人与类似的问题挣扎。谢谢你。
The solution explained in here worked for my case. There are two locations hive-site.xml resides that might be confusing. Use --files /usr/hdp/current/spark-client/conf/hive-site.xml
instead of --files /etc/hive/conf/hive-site.xml
. I didn't have to add the jars for my configuration. Hope this will help someone struggling with the similar problem. Thanks.
这篇关于阿帕奇星火的部署问题(集群模式)蜂巢的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!