连接到远程Spark主机 - Java / Scala [英] Connecting to a remote Spark master - Java / Scala

查看：1160 发布时间：2018/5/31 19:20:42 java scala hadoop apache-spark amazon-ec2

本文介绍了连接到远程Spark主机 - Java / Scala的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我在AWS中创建了3个节点（1个主节点，2个工作者） Apache Spark 集群。我能够从主服务器向集群提交作业，但是我无法远程工作。

I created a 3 node (1 master, 2 workers) Apache Spark cluster in AWS. I'm able to submit jobs to the cluster from the master, however I cannot get it work remotely.

/* SimpleApp.scala */
import org.apache.spark.SparkContext
import org.apache.spark.SparkContext._
import org.apache.spark.SparkConf

object SimpleApp {
  def main(args: Array[String]) {
    val logFile = "/usr/local/spark/README.md" // Should be some file on your system
    val conf = new SparkConf().setAppName("Simple Application").setMaster("spark://ec2-54-245-111-320.compute-1.amazonaws.com:7077")
    val sc = new SparkContext(conf)
    val logData = sc.textFile(logFile, 2).cache()
    val numAs = logData.filter(line => line.contains("a")).count()
    val numBs = logData.filter(line => line.contains("b")).count()
    println(s"Lines with a: $numAs, Lines with b: $numBs")
    sc.stop()
  }
}

我可以从主人看到：

Spark Master at spark://ip-171-13-22-125.ec2.internal:7077
URL: spark://ip-171-13-22-125.ec2.internal:7077
REST URL: spark://ip-171-13-22-125.ec2.internal:6066 (cluster mode)

因此，当我从本地机器执行 SimpleApp.scala 时，它无法连接到 Spark Master $ b

so when I execute SimpleApp.scala from my local machine, it fails to connect to the the Spark Master:

2017-02-04 19:59:44,074 INFO  [appclient-register-master-threadpool-0] client.StandaloneAppClient$ClientEndpoint (Logging.scala:54)  [] - Connecting to master spark://ec2-54-245-111-320.compute-1.amazonaws.com:7077...
2017-02-04 19:59:44,166 WARN  [appclient-register-master-threadpool-0] client.StandaloneAppClient$ClientEndpoint (Logging.scala:87)  [] - Failed to connect to spark://ec2-54-245-111-320.compute-1.amazonaws.com:7077
org.apache.spark.SparkException: Exception thrown in awaitResult
    at org.apache.spark.rpc.RpcTimeout$$anonfun$1.applyOrElse(RpcTimeout.scala:77) ~[spark-core_2.10-2.0.2.jar:2.0.2]
    at org.apache.spark.rpc.RpcTimeout$$anonfun$1.applyOrElse(RpcTimeout.scala:75) ~[spark-core_2.10-2.0.2.jar:2.0.2]
    at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:33) ~[scala-library-2.10.0.jar:?]
    at org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:59) ~[spark-core_2.10-2.0.2.jar:2.0.2]

然而，我知道如果我将主人设置为 local ，那么它会起作用，因为它会在本地运行。但是，我想让我的客户端连接到此远程主服务器。我怎么能做到这一点？ Apache配置文件。我甚至可以远程登录到该公共DNS和端口，我也为每个EC2 配置了 / etc / hosts code>实例。我希望能够将作业提交给此远程主人，我错过了什么？


However, I know it would have worked if I had set the master to local, because then it would run locally. However, I want to have my client connecting to this remote master. How can I accomplish that? The Apache configuration looks file. I can even telnet to that public DNS and port, I also configured /etc/hosts with the public DNS and hostname for each of the EC2 instances.
I want to be able to submit jobs to this remote master, what am I missing? 
推荐答案
绑定主机名/ IP到您的spark安装conf目录（spark-2.0.2-bin-hadoop2 .7 / conf）并使用下面的命令创建spark-env.sh文件。
For binding master host-name/IP go to your spark installation conf directory (spark-2.0.2-bin-hadoop2.7/conf) and create spark-env.sh file using below command.
cp spark-env.sh.template spark-env.sh

在vi编辑器中打开spark-env.sh文件，并在host-您的主人的姓名/ IP。
Open spark-env.sh file in vi editor and add below line with host-name/IP of your master.
SPARK_MASTER_HOST=ec2-54-245-111-320.compute-1.amazonaws.com

使用stop-all.sh和start-all.sh停止并启动Spark。现在，您可以使用它来连接远程主服务器使用
Stop and start Spark using stop-all.sh and start-all.sh. Now you can use it to connect remote master using 
val spark = SparkSession.builder()
  .appName("SparkSample")
  .master("spark://ec2-54-245-111-320.compute-1.amazonaws.com:7077")
  .getOrCreate()

有关设置环境变量的更多信息，请查看 http://spark.apache.org/docs/latest/spark-standalone.html#cluster-launch-scripts  
For more information on setting environment variables please check http://spark.apache.org/docs/latest/spark-standalone.html#cluster-launch-scripts

                        这篇关于连接到远程Spark主机 -  Java / Scala的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！


                    
                        查看全文

连接到远程Spark主机 - Java / Scala [英] Connecting to a remote Spark master - Java / Scala

问题描述

推荐答案

相关文章

Java开发最新文章

热门教程

热门工具

登录关闭

连接到远程Spark主机 - Java / Scala [英] Connecting to a remote Spark master - Java / Scala

问题描述

推荐答案

相关文章

Java开发最新文章

热门教程

热门工具

登录 关闭

登录关闭