风暴中的分布式缓存 [英] Distributed caching in storm

查看:90
本文介绍了风暴中的分布式缓存的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如何在Apache Storm中存储临时数据?

How to store the temporary data in Apache storm?

在风暴拓扑中,bolt需要访问先前处理的数据.

In storm topology, bolt needs to access the previously processed data.

Eg: if the bolt processes varaiable1 with result as 20 at 10:00 AM.

,并再次在10:15 AM作为50接收varaiable1,则结果应为30 (50-20)

and again varaiable1 is received as 50 at 10:15 AM then the result should be 30 (50-20)

稍后,如果varaiable1收到70,则结果应为20 (70-50),位于10:30.

later if varaiable1 receives 70 then the result should be 20 (70-50) at 10:30.

如何实现此功能.

推荐答案

简而言之,您想对Storm的运行元组进行微分批计算. 首先,您需要在元组集中定义/查找键. 使用该键在螺栓之间进行字段分组(不要使用随机分组).这将确保相关的元组将始终发送给同一键的下游螺栓相同的任务. 定义类级别集合列表/映射以维护旧值并在其中添加新值以进行计算,不必担心它们在同一螺栓的不同执行者实例之间是线程安全的.

In short, you wanted to do micro-batching calculations with in storm’s running tuples. First you need to define/find key in tuple set. Do field grouping(don't use shuffle grouping) between bolts using that key. This will guarantee related tuples will always send to same task of downstream bolt for same key. Define class level collection List/Map to maintain old values and add new value in same for calculation, don’t worry they are thread safe between different executors instance of same bolt.

这篇关于风暴中的分布式缓存的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆