如何以编程方式运行Spark作业 [英] How can I run Spark job programmatically

查看:117
本文介绍了如何以编程方式运行Spark作业的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我不会以编程方式运行Spark作业-直接从Idea(我的笔记本电脑)将 SparkPi 计算提交到远程集群:

I wan't to run Spark job programmatically - submit SparkPi calculation to remote cluster directly from Idea (my laptop):

object SparkPi {

  def main(args: Array[String]) {
    val conf = new SparkConf().setAppName("Spark Pi")
      .setMaster("spark://host-name:7077")
    val spark = new SparkContext(conf)
    val slices = if (args.length > 0) args(0).toInt else 2
    val n = 100000 * slices
    val count = spark.parallelize(1 to n, slices).map { i =>
      val x = random * 2 - 1
      val y = random * 2 - 1
      if (x * x + y * y < 1) 1 else 0
    }.reduce(_ + _)
    println("Pi is roughly " + 4.0 * count / n)
    spark.stop()
  }

}

但是,当我运行它时,发现以下错误:

However, when I run it, I observe the following error:

14/12/08 11:31:20 ERROR security.UserGroupInformation: PriviledgedActionException as:remeniuk (auth:SIMPLE) cause:java.util.concurrent.TimeoutException: Futures timed out after [30 seconds]
Exception in thread "main" java.lang.reflect.UndeclaredThrowableException: Unknown exception in doAs
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1421)
    at org.apache.spark.deploy.SparkHadoopUtil.runAsSparkUser(SparkHadoopUtil.scala:52)
    at org.apache.spark.executor.CoarseGrainedExecutorBackend$.run(CoarseGrainedExecutorBackend.scala:113)
    at org.apache.spark.executor.CoarseGrainedExecutorBackend$.main(CoarseGrainedExecutorBackend.scala:156)
    at org.apache.spark.executor.CoarseGrainedExecutorBackend.main(CoarseGrainedExecutorBackend.scala)
Caused by: java.security.PrivilegedActionException: java.util.concurrent.TimeoutException: Futures timed out after [30 seconds]
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:415)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
    ... 4 more

当我从笔记本电脑中通过 spark-submit 运行相同的脚本时,会看到相同的错误.

When I run the same script with spark-submit from my laptop, I see the same error.

并且只有当我将jar上传到远程集群(运行master的机器)时,作业才能成功完成:

And only when I upload the jar to remote cluster (machine, where master is running), job complete successfully:

./bin/spark-submit --master spark://host-name:7077 --class com.viaden.crm.spark.experiments.SparkPi ../spark-experiments_2.10-0.1-SNAPSHOT.jar

推荐答案

根据异常堆栈,它应该是您的本地防火墙问题.

According to the exception stack, it should be your local firewall issue.

请参阅此类似案例使用Spark的间歇超时异常

这篇关于如何以编程方式运行Spark作业的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆