应在什么条件下使用群集部署模式而不是客户端? [英] What conditions should cluster deploy mode be used instead of client?

查看:116
本文介绍了应在什么条件下使用群集部署模式而不是客户端?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

文档 https://spark.apache.org/docs/1.1.0/submitting -applications.html

将部署模式描述为:

--deploy-mode: Whether to deploy your driver on the worker nodes (cluster) or locally as an external client (client) (default: client)

使用此图fig1作为指南(摘自 http://spark.apache .org/docs/1.2.0/cluster-overview.html ):

Using this diagram fig1 as a guide (taken from http://spark.apache.org/docs/1.2.0/cluster-overview.html) :

如果我启动了Spark工作:

If I kick off a Spark job :

./bin/spark-submit \
  --class com.driver \
  --master spark://MY_MASTER:7077 \
  --executor-memory 845M \
  --deploy-mode client \
  ./bin/Driver.jar

然后Driver Program将是MY_MASTER,如fig1 MY_MASTER

如果我改用--deploy-mode cluster,那么Driver Program是否将在辅助节点之间共享?如果这是真的,那么这是否意味着可以丢弃(不再使用)fig1中的Driver Program框,因为SparkContext也将在工作节点之间共享?

If instead I use --deploy-mode cluster then the Driver Program will be shared among the Worker Nodes ? If this is true then does this mean that the Driver Program box in fig1 can be dropped (as it is no longer utilized) as the SparkContext will also be shared among the worker nodes ?

应使用cluster代替client的条件是什么?

What conditions should cluster be used instead of client ?

推荐答案

否,当deploy-mode为client时,驱动程序不一定是主节点.您可以在便携式计算机上运行spark-submit,而驱动程序将在便携式计算机上运行.

No, when deploy-mode is client, the Driver Program is not necessarily the master node. You could run spark-submit on your laptop, and the Driver Program would run on your laptop.

相反,当deploy-mode为cluster时,群集管理器(主节点)用于查找具有足够可用资源来执行驱动程序的从属服务器.结果,驱动程序将在从属节点之一上运行.由于委托了执行,因此无法从驱动程序获得结果,它必须将其结果存储在文件,数据库等中.

On the contrary, when deploy-mode is cluster, then cluster manager (master node) is used to find a slave having enough available resources to execute the Driver Program. As a result, the Driver Program would run on one of the slave nodes. As its execution is delegated, you can not get the result from Driver Program, it must store its results in a file, database, etc.

  • 客户端模式
    • 想获得工作结果(动态分析)
    • 更易于开发/调试
    • 控制驱动程序在何处运行
    • 始终启动应用程序:将您的Spark作业启动器公开为REST服务或Web UI
    • Client mode
      • Want to get a job result (dynamic analysis)
      • Easier for developing/debugging
      • Control where your Driver Program is running
      • Always up application: expose your Spark job launcher as REST service or a Web UI
      • 更容易进行资源分配(让主人决定):忘却了
      • 像其他工作人员一样,从Master Web UI监视您的驱动程序
      • 最后停止:完成一项工作,释放分配的资源

      这篇关于应在什么条件下使用群集部署模式而不是客户端?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆