风暴中的分布式缓存 [英] Distributed caching in storm

查看:37
本文介绍了风暴中的分布式缓存的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

Apache Storm如何存储临时数据?

How to store the temporary data in Apache storm?

在storm拓扑中,bolt需要访问之前处理过的数据.

In storm topology, bolt needs to access the previously processed data.

Eg: if the bolt processes varaiable1 with result as 20 at 10:00 AM.

再次 varaiable110:15 AM 被接收为 50 那么结果应该是 30 (50-20)

and again varaiable1 is received as 50 at 10:15 AM then the result should be 30 (50-20)

稍后如果 varaiable1 收到 70 那么结果应该是 20 (70-50)10:30.

later if varaiable1 receives 70 then the result should be 20 (70-50) at 10:30.

如何实现此功能.

推荐答案

简而言之,您希望在 Storm 的运行元组中进行微批处理计算.首先,您需要在元组集中定义/查找键.使用该键在螺栓之间进行字段分组(不要使用随机分组).这将保证相关元组将始终发送到相同键的下游螺栓的相同任务.定义类级别的集合List/Map来维护旧值并在相同的值中添加新值进行计算,不用担心它们在同一个bolt的不同执行器实例之间是线程安全的.

In short, you wanted to do micro-batching calculations with in storm’s running tuples. First you need to define/find key in tuple set. Do field grouping(don't use shuffle grouping) between bolts using that key. This will guarantee related tuples will always send to same task of downstream bolt for same key. Define class level collection List/Map to maintain old values and add new value in same for calculation, don’t worry they are thread safe between different executors instance of same bolt.

这篇关于风暴中的分布式缓存的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆