通过Eclipse和Spark Context将spark应用作为纱线工作提交 [英] Submitting spark app as a yarn job from Eclipse and Spark Context

查看:107
本文介绍了通过Eclipse和Spark Context将spark应用作为纱线工作提交的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我已经可以从Eclipse IDE提交local spark作业(用Scala编写).但是,我想修改我的Spark上下文(在我的应用程序内部),以便当我运行"该应用程序(在Eclipse内部)时,该作业将使用Yarn作为资源管理器发送到我的远程集群.

I can already submit local spark jobs (written in Scala) from my Eclipse IDE. However, I would like to modify my Spark context (inside my application) so that when I 'Run' the app (inside Eclipse), the job will be sent to my remote cluster using Yarn as a resource manager.

使用spark-submit,我可以成功将作业提交给集群,如下所示: spark-submit --class <main class> --master yarn-cluster <jar>

Using spark-submit, I can successfully submit the job to the cluster as: spark-submit --class <main class> --master yarn-cluster <jar>

我想在IDE中实现相同的结果. 我的sbt配置(应用程序根目录)如下所示: libraryDependencies += "org.apache.spark" %% "spark-core" % "1.5.1" libraryDependencies += "org.apache.spark" %% "spark-yarn" % "1.5.1" % "provided" 在我的应用内: val conf = new SparkConf().setAppName("xxx").setMaster("yarn-cluster") 但是,出现以下错误:

I want to achieve the same result inside the IDE. My sbt config (app root directory) looks like: libraryDependencies += "org.apache.spark" %% "spark-core" % "1.5.1" libraryDependencies += "org.apache.spark" %% "spark-yarn" % "1.5.1" % "provided" Inside my app: val conf = new SparkConf().setAppName("xxx").setMaster("yarn-cluster") However, I am getting the following errors:

Detected yarn-cluster mode, but isn't running on a cluster. Deployment to YARN is not supported directly by SparkContext. Please use spark-submit.

Detected yarn-cluster mode, but isn't running on a cluster. Deployment to YARN is not supported directly by SparkContext. Please use spark-submit.

推荐答案

1)根据我进行的研究,当您从Eclipse进行远程提交时,不能在代码中使用yarn-cluster作为母版,而是使用spark-client./p>

1) According to research I have conducted you cannot use yarn-cluster as a master in your code when submitting remotely from Eclipse, use spark-client instead.

new SparkConf().setAppName("test-app").setMaster("yarn-client");

检查此 Cloudera 资源,它们正在细化一些可能阻止您在群集模式下运行交互式"应用程序的约束条件.

Check this Cloudera resource, they are shredding some light on what might be the constraint preventing you from running you "interactive" application in cluster mode.

2)您可能会遇到资源未正确复制到群集的问题.解决我的问题的方法是,在项目的类路径中包含以下文件(没有任何幻想,现在我只是将它们复制到项目的src/java目录中):

2) You might run into the problem with resourced not being properly copied to the cluster. What solved the problem in my case, was including the following files in the classpath of the project (without any fanciness, for now I just copied them into src/java directory of the project):

  • core-site.xml
  • hdfs-site.xml
  • yarn-site.xml

确保特别是core-site.xml在类路径中,因为我读过的所有教程都没有提到它.并且您会遇到麻烦,因为没有fs.defaultFS配置,Spark将认为目标是目录与源(本地文件系统)相同,而不是远程HDFS文件系统.

Ensure that especially core-site.xml is in the classpath, because none of the tutorials I have read mentioned it.. And you will run into the trouble, since without fs.defaultFS configuration present, Spark will consider that the destination directory is the same as the source (your local file system) rather then remote HDFS filesystem.

这篇关于通过Eclipse和Spark Context将spark应用作为纱线工作提交的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆