kubernetes与纱线/Hadoop生态系统之间的火花 [英] spark over kubernetes vs yarn/hadoop ecosystem

查看:77
本文介绍了kubernetes与纱线/Hadoop生态系统之间的火花的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我看到在kubernetes上产生火花的吸引力很大.在Hadoop上运行Spark是否更好?两种方法都以分布式方法运行.有人可以帮助我了解在kubernetes上运行Spark与Hadoop生态系统之间的区别/比较吗?

I see a lot of traction for spark over kubernetes. Is it better over running spark on Hadoop? Both the approaches runs in distributive approach. Can someone help me understand the difference/comparision between running spark on kubernetes vs Hadoop ecosystem?

谢谢

推荐答案

有人可以帮我理解在kubernetes和Hadoop生态系统上运行spark的区别/比较吗?

Can someone help me understand the difference/comparision between running spark on kubernetes vs Hadoop ecosystem?

请事先警告,这是一个理论上的答案,因为我不再运行Spark,因此也没有在kubernetes上运行Spark,但是我既维护了Hadoop集群,又维护了kubernetes集群,所以我可以说他们之间的某些差异.

Be forewarned this is a theoretical answer, because I don't run Spark anymore, and thus I haven't run Spark on kubernetes, but I have maintained both a Hadoop cluster and now a kubernetes cluster, and so I can speak to some of their differences.

Kubernetes就像是一个经过艰苦奋战的资源管理器,具有合理的人所希望的api访问其所有组件的功能.它提供了非常轻松的声明性资源限制(cpu和ram甚至系统调用能力),非常非常非常方便的日志输出(通过kubectl返回给用户,并使用多种形式从集群中移出)日志管理方法),前所未有的指标收集和输出级别,使人们可以密切关注群集的健康状况以及群集中的工作,而且清单还在不断增加.

Kubernetes is as much a battle hardened resource manager with api access to all its components as a reasonable person could wish for. It provides very painless declarative resource limitations (both cpu and ram, plus even syscall capacities), very, very painless log egress (both back to the user via kubectl and out of the cluster using multiple flavors of log management approaches), unprecedented level of metrics gathering and egress allowing one to keep an eye on the health of the cluster and the jobs therein, and the list goes on and on.

但是也许一个人选择在kubernetes上运行Spark的最大原因就是一个人选择完全在kubernetes上运行的相同原因:共享资源,而不必为不同的工作负载创建新机器(当然,还有上述所有这些好处) ).因此,如果您有一个Spark集群,则很可能在没有积极运行作业的情况下烧掉$$$,而kubernetes会在其他节点未运行Spark的情况下乐意将其他作业调度到这些节点上职位.是的,我知道Mesos和Yarn是通用"集群资源管理器,但是根据我的经验,它们与kubernetes一样轻松或无处不在.

But perhaps the biggest reason one would choose to run Spark on kubernetes is the same reason one would choose to run kubernetes at all: shared resources rather than having to create new machines for different workloads (well, plus all of those benefits above). So if you have a Spark cluster, it is very, very likely it is going to burn $$$ while a job isn't actively running on it, versus kubernetes will cheerfully schedule other jobs onto those Nodes while they aren't running Spark jobs. Yes, I am aware that Mesos and Yarn are "generic" cluster resource managers, but it has not been my experience that they are as painless or ubiquitous as kubernetes.

我欢迎有人发表反叙述,或在kubernetes上贡献Spark的更多动手经验,但是

I would welcome someone posting the counter narrative, or contributing more hands-on experience of Spark on kubernetes, but tho

这篇关于kubernetes与纱线/Hadoop生态系统之间的火花的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆