为什么星火快于Hadoop的地图减少 [英] Why is Spark faster than Hadoop Map Reduce

查看:211
本文介绍了为什么星火快于Hadoop的地图减少的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

有人能解释一下使用字数例如,为什么星火会比的Map Reduce快?

Can someone explain using the word count example, why Spark would be faster than Map Reduce?

推荐答案

在内存中的Apache星火处理数据,而Hadoop的马preduce地图后,仍然存在回磁盘或减少动作,所以星火应该跑赢大市的Hadoop马preduce。

Apache Spark processes data in-memory while Hadoop MapReduce persists back to the disk after a map or reduce action, so Spark should outperform Hadoop MapReduce.

然而,星火需要大量的内存。就像标准的DB,它加载一个进程到内存中,并保持不动,直至另行通知,用于缓存的缘故。如果星火Hadoop的纱线等资源要求苛刻的服务运行,或者如果数据量太大完全融入内存,那么有可能是星火主要性能下降。

Nonetheless, Spark needs a lot of memory. Much like standard DBs, it loads a process into memory and keeps it there until further notice, for the sake of caching. If Spark runs on Hadoop YARN with other resource-demanding services, or if the data is too big to fit entirely into the memory, then there could be major performance degradations for Spark.

马云preduce,但是,只要工作完成后杀死其进程,所以它可以很容易地一起轻微的性能差异等服务运行。

MapReduce, however, kills its processes as soon as a job is done, so it can easily run alongside other services with minor performance differences.

星火只要我们谈论的是需要将相同的数据传过来很多次迭代计算占上风。但是,当涉及到一个通ETL样的工作,例如,数据转换和数据集成,那么马云preduce是这笔交易,这是它是专为。

Spark has the upper hand as long as we’re talking about iterative computations that need to pass over the same data many times. But when it comes to one-pass ETL-like jobs, for example, data transformation or data integration, then MapReduce is the deal—this is what it was designed for.

底线:星火当所有的数据在内存中适合,尤其是在专门的集群性能更好; Hadoop的马preduce是专为不适合在内存中,它可以与其他业务运行良好的数据。

Bottom line: Spark performs better when all the data fits in the memory, especially on dedicated clusters; Hadoop MapReduce is designed for data that doesn’t fit in the memory and it can run well alongside other services.

这篇关于为什么星火快于Hadoop的地图减少的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆