Lambda体系结构-为什么选择批处理层 [英] Lambda Architecture - Why batch layer

查看:90
本文介绍了Lambda体系结构-为什么选择批处理层的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在研究lambda架构,并了解如何将其用于构建容错大数据系统.

I am going through the lambda architecture and understanding how it can be used to build fault tolerant big data systems.

我想知道当所有内容都可以存储在实时视图中并从中生成结果时,批处理层有什么用?是因为不能使用实时存储来存储所有数据,所以它不是实时的,因为检索数据所用的时间取决于数据存储所用的空间.

I am wondering how batch layer is useful when everything can be stored in realtime view and generate the results out of it? is it because realtime storage cant be used to store all of the data, then it wont be realtime as the time taken to retrieve the data is dependent on the the space it took for the data to store.

推荐答案

为什么要批处理层

Why batch layer

为节省时间和金钱!

它基本上有两个功能,

  • 管理主数据集(假定是不可变的)
  • 要预先计算批处理视图以进行即席查询

所有内容都可以存储在实时视图中并从中生成结果-不正确

上述当然是可能的,但是不可行,因为数据可能是100到1000 PB,并且生成结果可能要花费很多时间!

The above is certainly possible, but not feasible as data could be 100's..1000's of petabytes and generating results could take time.. a lot of time!

关键是要在大型数据集上实现低延迟查询.批处理层用于创建批处理视图(低延迟查询),而实时层则用于通常较小的最新/更新数据.现在,可以通过批量视图和实时视图中的合并结果来回答任何临时查询,而不用对所有主数据集进行计算.

Key here, is to attain low-latency queries over large dataset. Batch layer is used for creating batch views (queries served with low-latency) and realtime layer is used for recent/updated data which is usually small. Now, any ad-hoc query can be answered by merging results from batch views and real-time views instead of computing over all the master dataset.

还要考虑一个查询(相同的查询?)在庞大的数据集上一次又一次地运行..浪费时间和金钱!

Also, think of a query (same query?) running again and again over huge dataset.. loss of time and money!

这篇关于Lambda体系结构-为什么选择批处理层的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆