kappa体系结构和lambda体系结构有什么区别 [英] What are the differences between kappa-architecture and lambda-architecture

查看:496
本文介绍了kappa体系结构和lambda体系结构有什么区别的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如果Kappa体系结构直接在流上进行分析,而不是将数据分成两个流,那么在像Kafka这样的消息系统中,数据存储在哪里?还是可以在数据库中进行重新计算?

If the Kappa-Architecture does analysis on stream directly instead of splitting the data into two streams, where is the datastored then, in a messagin-system like Kafka? or can it be in a database for recomputing?

与使用流处理引擎重新计算以进行批处理分析相比,单独的批处理层是否更快?

And is a seperate batch layer faster than recomputing with a stream processing engine for batch analytics?

推荐答案

要考虑的一个非常简单的情况是,当算法应用于 实时数据和历史数据是相同的.那是 使用相同的代码库显然非常有益 历史和实时数据,从而实现用例 使用Kappa架构".现在,用于处理的算法 历史数据和实时数据并不总是相同的.在一些 在这种情况下,批处理算法可以由于以下事实而得到优化: 有权访问完整的历史数据集,然后胜过 实时算法的实现.在这里,选择 Lambda和Kappa成为批处理执行之间的选择 性能优于代码库的简单性."最后,还有更多 复杂的用例,即使是实时的输出 批处理算法不同.例如,机器学习 批处理模型的生成需要大量时间的应用程序 实时可获得的最佳结果的资源是 计算和该模型的近似更新.在这种情况下, 批处理层和实时层无法合并,Lambda 建筑".

"A very simple case to consider is when the algorithms applied to the real-time data and to the historical data are identical. Then it is clearly very beneficial to use the same code base to process historical and real-time data, and therefore to implement the use-case using the Kappa architecture". "Now, the algorithms used to process historical data and real-time data are not always identical. In some cases, the batch algorithm can be optimized thanks to the fact that it has access to the complete historical dataset, and then outperform the implementation of the real-time algorithm. Here, choosing between Lambda and Kappa becomes a choice between favoring batch execution performance over code base simplicity". "Finally, there are even more complex use-cases, in which even the outputs of the real-time and batch algorithm are different. For example, a machine learning application where generation of the batch model requires so much time and resources that the best result achievable in real-time is computing and approximated updates of that model. In such cases, the batch and real-time layers cannot be merged, and the Lambda architecture must be used".

报价

  • 分批处理和流分层
  • 更高的代码复杂度
  • 通过单独的批处理/流实现更快的性能
  • 更好地在批处理和流中使用不同的算法
  • 具有用于批量计算而不是数据库的数据存储的廉价

  • 只有蒸汽处理层
  • 易于维护,较低的复杂度,批处理和批处理的单一算法 流
  • 如果从数据库中重新进行批处理计算,那么太多的数据将很昂贵
  • 如果从数据库或kafka重新计算批次数据,则数据处理速度会变慢
  • only a steam processing layer
  • easier to maintain, lower complexity, single algorithm for batch and stream
  • too much data would be expensive if recomputed from a database for batch
  • too much data would be slower to process if recomputed from database or from kafka for batch

这篇关于kappa体系结构和lambda体系结构有什么区别的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆