马preduce或火花? [英] MapReduce or Spark?

查看:248
本文介绍了马preduce或火花?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我已经测试过Hadoop和马preduce与Cloudera的,我发现它pretty冷静,我认为我是最近和最相关的BigData解决方案。但前几天,我发现这一点:
https://spark.incubator.apache.org/

I have tested hadoop and mapreduce with cloudera and I found it pretty cool, I thought I was the most recent and relevant BigData solution. But few days ago, I found this : https://spark.incubator.apache.org/

一个快如闪电集群计算系统,能够在Hadoop集群的顶部工作,显然能粉碎马preduce。我看到它在RAM比马preduce工作更多。
我认为,马preduce仍然是相关的,当你需要做的集群计算,克服I /你可以在一台机器上的O问题。
但是,由于星火能做到的作业均线preduce做的,可能的方式更有效的几种操作,是不是马云preduce的结束?还是有更多的东西,马云preduce可以做,也可以马云$​​ P $ pduce比星火更有效率在一定的范围内?

A "Lightning fast cluster computing system", able to work on the top of a Hadoop cluster, and apparently able to crush mapreduce. I saw that it worked more in RAM than mapreduce. I think that mapreduce is still relevant when you have to do cluster computing to overcome I/O problems you can have on a single machine. But since Spark can do the jobs that mapreduce do, and may be way more efficient on several operations, isn't it the end of MapReduce ? Or is there something more that MapReduce can do, or can MapReduce be more efficient than Spark in a certain context ?

推荐答案

马云preduce是一批性质为主。因此,在MR实现比如Hive和Pig顶部的任何框架也批性质导向。对于迭代处理中机器学习和交互式分析的情况下,的Hadoop / MR不符合要求。 <一href=\"http://blog.cloudera.com/blog/2014/03/why-apache-spark-is-a-crossover-hit-for-data-scientists/\">Here从Cloudera的一个很好的文章为什么星火,总结它非常漂亮。

MapReduce is batch oriented in nature. So, any frameworks on top of MR implementations like Hive and Pig are also batch oriented in nature. For iterative processing as in the case of Machine Learning and interactive analysis, Hadoop/MR doesn't meet the requirement. Here is a nice article from Cloudera on Why Spark which summarizes it very nicely.

这不是MR的结束。写这篇文章的Hadoop相比,Spark和很多厂商的支持时,是非常成熟的。它将随时间改变。 Cloudera的开始,包括Spark在CDH随着时间的推移越来​​越多的厂商将包括在他们的大数据分布和为其提供商业支持。我们会看到MR并在可预见的未来并行的火花。

It's not an end of MR. As of this writing Hadoop is much mature when compared to Spark and a lot of vendors support it. It will change over time. Cloudera has started including Spark in CDH and over time more and more vendors would be including it in their Big Data distribution and providing commercial support for it. We would see MR and Spark in parallel for foreseeable future.

同时使用Hadoop 2(又名纱),MR等多种型号(包括星火)可以在单个集群上运行。所以,Hadoop是不会去任何地方。

Also with Hadoop 2 (aka YARN), MR and other models (including Spark) can be run on a single cluster. So, Hadoop is not going anywhere.

这篇关于马preduce或火花?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆