阿帕奇星火与Apache的风暴 [英] Apache Spark vs. Apache Storm

查看:257
本文介绍了阿帕奇星火与Apache的风暴的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

什么是阿帕奇星火并的 Apache的风暴?什么是适合的用例为每一个?


解决方案

阿帕奇Spark是在内存中的分布式数据分析平台 - 主要针对加快批次分析工作,反复学习机作业,交互式查询和图形处理。

一个火花的主要区别是其使用RDDS或弹性分布式数据集。 RDDS是伟大的用于计算的并行流水线操作,并根据定义,不可改变的,这使得星火根据谱系信息的容错独特的形式。如果您有兴趣,例如,执行Hadoop的麻preduce工作更快,Spark是一个不错的选择(尽管内存要求必须考虑)。

阿帕奇风暴集中在流处理或一些人所谓复杂事件处理。风暴实现用于执行计算或上一个事件流水线多次计算,因为它流入的系统的容错方法。一个可能使用风暴,因为它流向转换非结构化数据转换成一个系统为所需的格式。

风暴和星火都集中在相当不同的使用情况。更多的苹果对苹果的比较是风暴之间和Spark流。由于星火的RDDS本质上是不可变的,星火流实现了用户自定义的时间间隔配料来袭更新,得到转化成自己的RDDS的方法。然后星火的并行运营商可以对这些RDDS进行计算。这是暴风它与每个单独的事件涉及不同的。

这两种技术之间的一个主要区别是,星火执行数据并行计算的同时,暴风执行的Task-Parallel计算的。无论是设计,让那些值得了解的权衡。我建议检查出这些链接。

编辑:发现了这 / p>

What are the differences between Apache Spark and Apache Storm? What are suitable use cases for each one?

解决方案

Apache Spark is an in-memory distributed data analysis platform-- primarily targeted at speeding up batch analysis jobs, iterative machine learning jobs, interactive query and graph processing.

One of Spark's primary distinctions is its use of RDDs or Resilient Distributed Datasets. RDDs are great for pipelining parallel operators for computation and are, by definition, immutable, which allows Spark a unique form of fault tolerance based on lineage information. If you are interested in, for example, executing a Hadoop MapReduce job much faster, Spark is a great option (although memory requirements must be considered).

Apache Storm is focused on stream processing or what some call complex event processing. Storm implements a fault tolerant method for performing a computation or pipelining multiple computations on an event as it flows into a system. One might use Storm to transform unstructured data as it flows into a system into a desired format.

Storm and Spark are focused on fairly different use cases. The more "apples-to-apples" comparison would be between Storm and Spark Streaming. Since Spark's RDDs are inherently immutable, Spark Streaming implements a method for "batching" incoming updates in user-defined time intervals that get transformed into their own RDDs. Spark's parallel operators can then perform computations on these RDDs. This is different from Storm which deals with each event individually.

One key difference between these two technologies is that Spark performs Data-Parallel computations while Storm performs Task-Parallel computations. Either design makes tradeoffs that are worth knowing. I would suggest checking out these links.

Edit: discovered this today

这篇关于阿帕奇星火与Apache的风暴的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆