尝试使用AWS SDK for Java在EMR上运行Spark,但它会跳过存储在S3上的远程JAR [英] Trying to run Spark on EMR using the AWS SDK for Java, but it skips the remote JAR stored on S3
问题描述
我正在尝试使用SDK for Java在EMR上运行Spark,但是我遇到了使用我在S3上存储的JAR获取spark-submit的问题。以下是相关代码:
I'm trying to run Spark on EMR using the SDK for Java, but I'm having issues getting the spark-submit to use a JAR that I have stored on S3. Here is the relevant code:
public String launchCluster() throws Exception {
StepFactory stepFactory = new StepFactory();
// Creates a cluster flow step for debugging
StepConfig enableDebugging = new StepConfig().withName("Enable debugging")
.withActionOnFailure("TERMINATE_JOB_FLOW")
.withHadoopJarStep(stepFactory.newEnableDebuggingStep());
// Here is the original code before I tried command-runner.jar.
// When using this, I get a ClassNotFoundException for
// org.apache.spark.SparkConf. This is because for some reason,
// the super-jar that I'm generating doesn't include apache spark.
// Even so, I believe EMR should already have Spark installed if
// I configure this correctly...
// HadoopJarStepConfig runExampleConfig = new HadoopJarStepConfig()
// .withJar(JAR_LOCATION)
// .withMainClass(MAIN_CLASS);
HadoopJarStepConfig runExampleConfig = new HadoopJarStepConfig()
.withJar("command-runner.jar")
.withArgs(
"spark-submit",
"--master", "yarn",
"--deploy-mode", "cluster",
"--class", SOME_MAIN_CLASS,
SOME_S3_PATH_TO_SUPERJAR,
"-useSparkLocal", "false"
);
StepConfig customExampleStep = new StepConfig().withName("Example Step")
.withActionOnFailure("TERMINATE_JOB_FLOW")
.withHadoopJarStep(runExampleConfig);
// Create Applications so that the request knows to launch
// the cluster with support for Hadoop and Spark.
// Unsure if Hadoop is necessary...
Application hadoopApp = new Application().withName("Hadoop");
Application sparkApp = new Application().withName("Spark");
RunJobFlowRequest request = new RunJobFlowRequest().withName("spark-cluster")
.withReleaseLabel("emr-5.15.0")
.withSteps(enableDebugging, customExampleStep)
.withApplications(hadoopApp, sparkApp)
.withLogUri(LOG_URI)
.withServiceRole("EMR_DefaultRole")
.withJobFlowRole("EMR_EC2_DefaultRole")
.withVisibleToAllUsers(true)
.withInstances(new JobFlowInstancesConfig()
.withInstanceCount(3)
.withKeepJobFlowAliveWhenNoSteps(true)
.withMasterInstanceType("m3.xlarge")
.withSlaveInstanceType("m3.xlarge")
);
return result.getJobFlowId();
}
步骤完成且没有错误,但实际上并没有输出任何内容.. 。当我检查日志时, stderr
包含以下内容:
警告:跳过远程jar s3:// somebucket / myservice- 1.0-super.jar。
和
18/07/17 22:08:31 WARN客户:两者都没有。 yarn.jars也设置了spark.yarn.archive,回退到SPARK_HOME下的库。
The steps complete without error, but it doesn't actually output anything...when I check the logs, stderr
includes the following
Warning: Skip remote jar s3://somebucket/myservice-1.0-super.jar.
and
18/07/17 22:08:31 WARN Client: Neither spark.yarn.jars nor spark.yarn.archive is set, falling back to uploading libraries under SPARK_HOME.
我不确定基于日志的问题是什么。我相信我正在群集上正确安装Spark。另外,给出一些上下文 - 当我使用 withJar
直接使用存储在S3上的超级JAR而不是命令运行器(并且没有 withArgs
),它正确地抓住了JAR,但它没有安装Spark - 我得到了SparkConf的ClassNotFoundException(以及JavaSparkContext,取决于我的Spark作业代码首先尝试创建的内容)。
I'm not sure what the issue is based on the log. I believe I am installing Spark correctly on the cluster. Also, to give some context - when I use withJar
directly with the super-JAR stored on S3 instead of command-runner (and without withArgs
), it correctly grabs the JAR, but then it doesn't have Spark installed - I get a ClassNotFoundException for SparkConf (and JavaSparkContext, depending on what my Spark job code tries to create first).
任何指针都会非常感激!
Any pointers would be much appreciated!
推荐答案
我认为如果你正在使用最近的EMR版本(例如emr-5.17.0), - master
参数应该是 yarn-cluster
而不是 runExampleConfig
语句中的 yarn
。
我有同样的问题,在这个改变之后,它对我来说很好。
I think that if your are using recent EMR release (emr-5.17.0 for instance), the --master
parameter should be yarn-cluster
instead of yarn
in the runExampleConfig
statement.
I had the same problem and, after this change, it works fine for me.
这篇关于尝试使用AWS SDK for Java在EMR上运行Spark,但它会跳过存储在S3上的远程JAR的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!