Spark Standalone,YARN和本地模式之间有什么区别? [英] What is the difference between Spark Standalone, YARN and local mode?

查看:704
本文介绍了Spark Standalone,YARN和本地模式之间有什么区别?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

Spark Standalone:

在这种模式下,我意识到您在本地计算机上运行主节点和工作节点.

In this mode I realized that you run your Master and worker nodes on your local machine.

这是否意味着您的本地计算机上正在运行YARN实例?自从我安装Spark以来,Hadoop附带了它,通常YARN也随Hadoop一起发货吗?在这种模式下,我基本上可以模拟完整簇的较小版本.

Does that mean you have an instance of YARN running on my local machine? Since when I installed Spark it came with Hadoop and usually YARN also gets shipped with Hadoop as well correct? And in this mode I can essentially simulate a smaller version of a full blown cluster.

火花本地模式:

这也是我很困惑的部分.要在此模式下运行它,请执行val conf = new SparkConf().setMaster("local[2]").

This is the part I am also confused on. To run it in this mode I do val conf = new SparkConf().setMaster("local[2]").

在这种模式下,它不使用任何类型的资源管理器(例如YARN)对吗?像这样只是在提供给"local[2]"\的线程数中运行Spark Job?

In this mode, it doesn't use any type of resource manager (like YARN) correct? Like it simply just runs the Spark Job in the number of threads which you provide to "local[2]"\?

推荐答案

您对Hadoop YARN和Spark感到困惑.

You are getting confused with Hadoop YARN and Spark.

YARN是重写MapReduce资源的软件重写 数据处理的管理和调度功能 组件,使Hadoop支持更多样化的处理 方法和更广泛的应用程序.

YARN is a software rewrite that decouples MapReduce's resource management and scheduling capabilities from the data processing component, enabling Hadoop to support more varied processing approaches and a broader array of applications.

随着YARN的推出,Hadoop已开放,可以在平台上运行其他应用程序.

With the introduction of YARN, Hadoop has opened to run other applications on the platform.

简而言之,YARN是可插入数据并行框架".

In short YARN is "Pluggable Data Parallel framework".

Apache Spark

Apache Spark

Apache spark是一个批处理交互式流框架. Spark有一个 可插拔的持久性存储". Spark可以与任何持久层一起运行.

Apache spark is a Batch interactive Streaming Framework. Spark has a "pluggable persistent store". Spark can run with any persistence layer.

要运行spark,需要资源.在独立模式下,您将启动工作进程,而火花主控和持久层可以是任意一个-HDFS,FileSystem,cassandra等.在YARN模式下,您正在要求YARN-Hadoop集群来管理资源分配和簿记.

For spark to run it needs resources. In standalone mode you start workers and spark master and persistence layer can be any - HDFS, FileSystem, cassandra etc. In YARN mode you are asking YARN-Hadoop cluster to manage the resource allocation and book keeping.

当将master用作local[2]时,您要求Spark使用2个内核,并在同一JVM中运行驱动程序和工作程序.在本地模式下,所有与火花作业相关的任务都在同一JVM中运行.

When you use master as local[2] you request Spark to use 2 core's and run the driver and workers in the same JVM. In local mode all spark job related tasks run in the same JVM.

因此,独立模式和本地模式之间的唯一区别是,在独立模式下,您要为工作程序和spark主程序定义容器"以在您的计算机中运行(因此,您可以有2个工作程序,并且您的任务可以分布在JVM的JVM中)那两个工作人员?),但是在本地模式下,您只是在本地计算机的同一JVM中运行所有内容.

So the only difference between Standalone and local mode is that in Standalone you are defining "containers" for the worker and spark master to run in your machine (so you can have 2 workers and your tasks can be distributed in the JVM of those two workers?) but in local mode you are just running everything in the same JVM in your local machine.

这篇关于Spark Standalone,YARN和本地模式之间有什么区别?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆