Scala Spark连接到远程集群 [英] Scala Spark connect to remote cluster
问题描述
我希望连接到远程集群并执行Spark进程.因此,据我所读,这是在SparkConf中指定的.
I wish to connect to a remote cluster and execute a Spark process. So, from what I have read, this is specified in the SparkConf.
val conf = new SparkConf()
.setAppName("MyAppName")
.setMaster("spark://my_ip:7077")
其中my_ip是群集的IP地址.不幸的是,我被拒绝连接.因此,我猜测必须添加一些凭据才能正确连接.我将如何指定凭据?似乎可以通过.set(key,value)来完成,但是对此没有任何线索.
Where my_ip is the IP address of my cluster. Unfortunately, I get connection refused. So, I am guessing some credentials must be added to connect correctly. How would I specify the credentials? It seems it would be done with .set(key, value), but have no leads on this.
推荐答案
缺少两件事:
- 应将集群管理器设置为
yarn
(setMaster("yarn")),并将部署模式设置为cluster
,您当前的设置用于Spark独立.更多信息在这里: http://spark.apache.org/docs/latest/configuration.html#application-properties - 此外,您还需要从集群中获取
yarn-site.xml
和core-site.xml
文件,并将它们放入HADOOP_CONF_DIR
,以便Spark可以获取纱线设置,例如主节点的IP.更多信息: https://theckang.github.io/2015/12/31/remote-spark-jobs-on-yarn.html
- The cluster manager should be set to
yarn
(setMaster("yarn")) and the deploy-mode tocluster
, your current setup is used for Spark standalone. More info here: http://spark.apache.org/docs/latest/configuration.html#application-properties - Also, you need to get
yarn-site.xml
andcore-site.xml
files from the cluster and put them inHADOOP_CONF_DIR
, so that Spark can pick up yarn settings, such as the IP of your master node. More info: https://theckang.github.io/2015/12/31/remote-spark-jobs-on-yarn.html
顺便说一句,如果您使用 spark-submit
提交作业,则此方法可以工作,从编程上讲,实现该任务比较复杂,并且只能使用 yarn-client
模式远程设置很棘手.
By the way, this would work if you use spark-submit
to submit a job, programatically it's more complex to achieve it and could only use yarn-client
mode which is tricky to setup remotely.
这篇关于Scala Spark连接到远程集群的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!