Yarn 给现有的 map reduce 带来了哪些额外的好处? [英] What additional benefit does Yarn bring to the existing map reduce?

查看:24
本文介绍了Yarn 给现有的 map reduce 带来了哪些额外的好处?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

Yarn 的基础设施层与原始 map reduce 架构的不同之处在于:

Yarn differs in its infrastructure layer from the original map reduce architecture in the following way:

在 YARN 中,作业跟踪器分为两个不同的守护进程,称为 Resource ManagerNode Manager(特定于节点).资源管理器只管理对不同作业的资源分配,除了包含一个调度器,它只负责调度作业,而不用担心任何监控或状态更新.不同的资源,如内存、cpu 时间、网络带宽等,都被放入一个称为 Resource Container 的单元中.有不同的 AppMasters 运行在不同的节点上,它们与这些资源容器中的许多通信,并相应地使用监控/状态详细信息更新节点管理器.

In YARN, the job tracker is split into two different daemons called Resource Manager and Node Manager (node specific). The resource manager only manages the allocation of resources to the different jobs apart from comprising a scheduler which just takes care of the scheduling jobs without worrying about any monitoring or status updates. Different resources such as memory, cpu time, network bandwidth etc. are put into one unit called the Resource Container. There are different AppMasters running on different nodes which talk to a number of these resource containers and accordingly update the Node Manager with the monitoring/status details.

我想知道使用这种方法如何从 map-reduce 的角度提高性能?另外,如果有任何关于 Yarn 背后的动机及其相对于现有 Map-reduce 实现的好处的明确内容,请指出相同的内容.

I want to know that how does using this kind of an approach increase the performance from the map-reduce perspective? Also, if there is any definitive content on the motivation behind Yarn and its benefits over the existing implementation of Map-reduce, please point me to the same.

推荐答案

这里有一些文章(1, 23) 关于 YARN.这些讨论了使用 YARN 的好处.

Here are some of the articles (1, 2, 3) about YARN. These talk about the benefits of using YARN.

YARN 比 MR 更通用,应该可以运行除 MR 之外的其他计算模型,如 BSP.在 YARN 之前,它需要一个单独的集群用于 MR、BSP 和其他.现在它们可以在单个集群中共存,从而提高集群的使用率.这里是移植到 YARN 的一些应用程序.

YARN is more general than MR and it should be possible to run other computing models like BSP besides MR. Prior to YARN, it required a separate cluster for MR, BSP and others. Now they they can coexist in a single cluster, which leads to higher usage of the cluster. Here are some of the applications ported to YARN.

从传统 MR 中的 MapReduce 角度来看,Map 和 Reduce 任务有单独的插槽,但在 YARN 中,它们不是容器的固定用途.同一个容器可用于 Map 任务、Reduce 任务、Hama BSP 任务或其他任务.这样可以提高利用率.

From a MapReduce perspective in legacy MR there are separate slots for Map and Reduce tasks, but in YARN their is no fixed purpose of a container. The same container can be used for a Map task, Reduce task, Hama BSP Task or something else. This leads to better utilization.

此外,它还可以在同一个集群中运行不同版本的 Hadoop,这是传统 MR 无法做到的,这从维护角度来看很容易.

Also, it makes it possible to run different versions of Hadoop in the same cluster which is not possible with legacy MR, which makes is easy from a maintenance point.

这里是 YARN 的一些附加链接.此外,Hadoop:权威指南,第 3 版有一整节专门介绍 YARN.

Here are some of the additional links for YARN. Also, Hadoop: The Definitive Guide, 3rd Edition has an entire section dedicated to YARN.

仅供参考,有点有争议的来开发 YARN,而不是使用一些框架,这些框架已经做了类似的事情并且已经成功运行了很长时间,并且已经解决了错误.

FYI, it had been a bit controversial to develop YARN instead of using some of frameworks which had been doing something similar and had been running for ages successfully with bugs ironed out.

这篇关于Yarn 给现有的 map reduce 带来了哪些额外的好处?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆