kappa-architecture 和 lambda-architecture 有什么区别 [英] What are the differences between kappa-architecture and lambda-architecture

查看:22
本文介绍了kappa-architecture 和 lambda-architecture 有什么区别的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如果 Kappa-Architecture 直接对流进行分析而不是将数据拆分为两个流,那么在像 Kafka 这样的消息系统中,数据存储在哪里?还是可以在数据库中重新计算?

单独的批处理层是否比使用流处理引擎重新计算进行批处理更快?

解决方案

要考虑的一个非常简单的情况是算法应用于实时数据和历史数据是一致的.那么就是使用相同的代码库来处理显然非常有益历史和实时数据,从而实现用例使用 Kappa 架构".现在,用于处理的算法历史数据和实时数据并不总是相同的.在一些在这种情况下,批处理算法可以优化,因为它可以访问完整的历史数据集,然后超越实时算法的实现.在这里,选择Lambda 和 Kappa 成为支持批处理执行的选择性能优于代码库的简单性".最后,还有更多复杂的用例,其中甚至实时和批处理算法不同.例如,机器学习批量模型的生成需要大量时间的应用程序和资源,可实时获得的最佳结果是该模型的计算和近似更新.在这种情况下,批处理和实时层不能合并,并且 Lambda必须使用架构".

  • 单独的批处理和流层
  • 代码复杂度更高
  • 通过单独的批处理/流提高性能
  • 更适合批处理和流中的不同算法
  • 使用用于批量计算的数据存储而不是数据库来降低成本

  • 只有一个蒸汽处理层
  • 更易于维护,复杂度更低,用于批处理和的单一算法流
  • 如果从数据库中为批处理重新计算过多的数据会很昂贵
  • 如果从数据库或 kafka 重新计算,处理过多的数据会变慢

If the Kappa-Architecture does analysis on stream directly instead of splitting the data into two streams, where is the datastored then, in a messagin-system like Kafka? or can it be in a database for recomputing?

And is a seperate batch layer faster than recomputing with a stream processing engine for batch analytics?

解决方案

"A very simple case to consider is when the algorithms applied to the real-time data and to the historical data are identical. Then it is clearly very beneficial to use the same code base to process historical and real-time data, and therefore to implement the use-case using the Kappa architecture". "Now, the algorithms used to process historical data and real-time data are not always identical. In some cases, the batch algorithm can be optimized thanks to the fact that it has access to the complete historical dataset, and then outperform the implementation of the real-time algorithm. Here, choosing between Lambda and Kappa becomes a choice between favoring batch execution performance over code base simplicity". "Finally, there are even more complex use-cases, in which even the outputs of the real-time and batch algorithm are different. For example, a machine learning application where generation of the batch model requires so much time and resources that the best result achievable in real-time is computing and approximated updates of that model. In such cases, the batch and real-time layers cannot be merged, and the Lambda architecture must be used".

Quote

  • Seperate Batch and Stream-Layer
  • Higher code complexity
  • Faster performance with seperate batch/stream
  • better for different algorithms in batch and stream
  • cheaper with a data storage for batch-computing instead of a database

  • only a steam processing layer
  • easier to maintain, lower complexity, single algorithm for batch and stream
  • too much data would be expensive if recomputed from a database for batch
  • too much data would be slower to process if recomputed from database or from kafka for batch

这篇关于kappa-architecture 和 lambda-architecture 有什么区别的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆