星火独立:客户端和群集中部署模式之间的差异 [英] Spark Standalone: Differences between client and cluster deploy modes

查看:162
本文介绍了星火独立:客户端和群集中部署模式之间的差异的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

TL; DR:在星火独立集群,什么是客户端和群集中部署模式之间的区别?我如何设置我的应用程序将运行在何种模式?


我们有三台机器一星火独立集群,它们都与星火1.6.1:


  • 主计算机,这也正是我们的应用程序正在使用运行火花提交

  • 2个相同的工人机

星火文档,我读


  

(...)对于独立的集群,星火目前支持两种部署模式。在客户机模式中,该驱动程序在相同的处理用于提交应用程序的客户机启动。在群集模式,然而,司机从群集内的工作进程中的一个启动,客户端进程,只要它满足提交应用程序,而不等待应用程序到结束的责任退出。


不过,我真的不通过阅读这篇了解实际的差异,我没有得到什么优势和不同的部署模式的优缺点。

此外,当我使用开始提交,即使我设置属性 spark.submit.deployMode 来集群,星火UI我的背景下开始我的应用程序显示了以下项:

上下文UI

所以我不能够测试这两种模式,看看实际的差异。话虽这么说,我的问题是:

1)什么是星火独立的客户端之间的实际差别的部署模式的的群集的部署模式?什么是亲的和使用每一个反对的?

2)如何我选择哪一个我的应用程序要运行时,使用火花提交


解决方案

  

什么是星火独立客户端之间的实际差别
  部署模式和集群部署模式?什么是亲的和反对的
  使用每一个?


让我们尝试一下客户端和群集模式之间的区别。

客户:


  • 驱动程序专用的进程中的专用服务器(主节点)上运行。这意味着它拥有所有可用的资源在它的处置工作执行

  • 驱动程序开辟了一个专门的Netty HTTP服务器和指定分配给所有工作节点(大优势)的JAR文件。

  • 由于主节点一直致力于它自己的资源,你并不需要为驱动程序花工人资源。

  • 如果驾驶过程中死了,你需要一个外部监测系统,重置它的执行。

集群:


  • 驱动程序在群集的工作节点之一运行。工人由主领导选择

  • 驱动程序运行作为一个专门的,独立进程的里面的工人

  • 驱动程序占用的至少的1核心,并从一个工人的存储器中的专用量(可以配置)。

  • 驱动程序可以从主节点使用进行监测 - 监督标记,并在它死的情况下被重置

  • 在集群模式下工作时,相关的应用程序的执行中的所有JAR文件必须公开提供给所有工人。这意味着你可以手工将它们放置在一个共享的地方或每个工人的文件夹中。

哪一个更好?不知道,这其实对你进行试验而定。这是在这里没有更好的决策,你从前者和后者获得的东西,它是由你来看看哪一个为您的使用情况较好。


  

如何选择我的应用程序要运行哪一个,
  使用火花提交


要选择是使用在运行的模式的方式 - 部署模式标记。从星火配置页:

  /斌/火花提交\\
  --class<主级>
  --master<主网址> \\
  --deploy模式&下;部署-模式> \\
  --conf<密钥GT; = LT;价值> \\
  ...#其他选项
  <应用-JAR> \\
  [应用参数]

TL;DR: In a Spark Standalone cluster, what are the differences between client and cluster deploy modes? How do I set which mode my application is going to run on?


We have a Spark Standalone cluster with three machines, all of them with Spark 1.6.1:

  • A master machine, which also is where our application is run using spark-submit
  • 2 identical worker machines

From the Spark Documentation, I read:

(...) For standalone clusters, Spark currently supports two deploy modes. In client mode, the driver is launched in the same process as the client that submits the application. In cluster mode, however, the driver is launched from one of the Worker processes inside the cluster, and the client process exits as soon as it fulfills its responsibility of submitting the application without waiting for the application to finish.

However, I don't really understand the practical differences by reading this, and I don't get what are the advantages and disadvantages of the different deploy modes.

Additionally, when I start my application using start-submit, even if I set the property spark.submit.deployMode to "cluster", the Spark UI for my context shows the following entry:

So I am not able to test both modes to see the practical differences. That being said, my questions are:

1) What are the practical differences between Spark Standalone client deploy mode and cluster deploy mode? What are the pro's and con's of using each one?

2) How to I choose which one my application is going to be running on, using spark-submit?

解决方案

What are the practical differences between Spark Standalone client deploy mode and cluster deploy mode? What are the pro's and con's of using each one?

Let's try to look at the differences between client and cluster mode.

Client:

  • Driver runs on a dedicated server (Master node) inside a dedicated process. This means it has all available resources at it's disposal to execute work.
  • Driver opens up a dedicated Netty HTTP server and distributes the JAR files specified to all Worker nodes (big advantage).
  • Because the Master node has dedicated resources of it's own, you don't need to "spend" worker resources for the Driver program.
  • If the driver process dies, you need an external monitoring system to reset it's execution.

Cluster:

  • Driver runs on one of the cluster's Worker nodes. The worker is chosen by the Master leader
  • Driver runs as a dedicated, standalone process inside the Worker.
  • Driver programs takes up at least 1 core and a dedicated amount of memory from one of the workers (this can be configured).
  • Driver program can be monitored from the Master node using the --supervise flag and be reset in case it dies.
  • When working in Cluster mode, all JARs related to the execution of your application need to be publicly available to all the workers. This means you can either manually place them in a shared place or in a folder for each of the workers.

Which one is better? Not sure, that's actually for you to experiment and decide. This is no better decision here, you gain things from the former and latter, it's up to you to see which one works better for your use-case.

How to I choose which one my application is going to be running on, using spark-submit

The way to choose which mode to run in is by using the --deploy-mode flag. From the Spark Configuration page:

/bin/spark-submit \
  --class <main-class>
  --master <master-url> \
  --deploy-mode <deploy-mode> \
  --conf <key>=<value> \
  ... # other options
  <application-jar> \
  [application-arguments]

这篇关于星火独立:客户端和群集中部署模式之间的差异的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆