Apache Storm 与 Hadoop 的比较 [英] Apache Storm compared to Hadoop

查看:30
本文介绍了Apache Storm 与 Hadoop 的比较的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

Storm 与 Hadoop 相比如何?Hadoop 似乎是开源大规模批处理的事实上的标准,Storm 比 hadoop 有什么优势吗?或者它们完全不同吗?

How does Storm compare to Hadoop? Hadoop seems to be the defacto standard for open-source large scale batch processing, does Storm has any advantages over hadoop? or Are they completely different?

推荐答案

你为什么不说出你的意见.

Why don't you tell your opinion.

Twitter Storm 被吹捧为实时 Hadoop.这更像是一种易于消费的营销方式.

Twitter Storm has been touted as real time Hadoop. That is more a marketing take for easy consumption.

它们表面上很相似,因为它们都是分布式应用程序解决方案.除了典型的分布式架构元素,如主/从、基于动物园管理员的协调,对我来说比较落伍了.

They are superficially similar since both are distributed application solutions. Apart from the typical distributed architectural elements like master/slave, zookeeper based coordination, to me comparison falls off the cliff.

Twitter 更像是一个处理数据的管道.管道是连接接收数据、计算和传递输出的各种计算节点的东西.(行话是喷口和螺栓)将此类比扩展到复杂的管道布线,可以在需要时重新设计,然后您将获得 Twitter Storm.

Twitter is more like a pipline for processing data as it comes. The pipe is what connects various computing nodes that receive data, compute and deliver output. (There lingo is spouts and bolts) Extend this analogy to a complex pipeline wiring that can be re-engineered when required and you get Twitter Storm.

简而言之,它会在数据到来时进行处理.没有延迟.

In nut shell it processes data as it comes. There is no latency.

Hadoop 在这方面的不同之处主要是由于 HDFS.它是一种面向分布式存储和容忍多种规模(磁盘、机器、机架等)中断的解决方案

Hadoop how ever is different in this respect primarily due to HDFS. It a solution geared to distributed storage and tolerance to outage of many scales (disks, machines, racks etc)

M/R 旨在利用 HDFS 上的数据本地化来分发计算作业.总之,它们不提供用于实时数据处理的设施.但是,当您查看大数据时,这并不总是必要的.(大海捞针比喻)

M/R is built to leverage data localization on HDFS to distribute computational jobs. Together, they do not provide facility for real time data processing. But that is not always a requirement when you are looking through large data. (needle in the haystack analogy)

简而言之,Twitter Storm 是一种分布式实时数据处理解决方案.我认为我们不应该比较它们.Twitter 建立它是因为它需要一个工具来实时处理小推文但数量庞大的推文.

In short, Twitter Storm is a distributed real time data processing solution. I don't think we should compare them. Twitter built it because it needed a facility to process small tweets but humungous number of them and in real time.

请参阅:HStreaming 如果您不得不将其与某些东西进行比较

See: HStreaming if you are compelled to compare it with some thing

这篇关于Apache Storm 与 Hadoop 的比较的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆