在 Storm 中通过螺栓链确认的正确方法 [英] Proper way to ACK in Storm in a chain of bolts

查看:23
本文介绍了在 Storm 中通过螺栓链确认的正确方法的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

只是想确保我了解了 Ack-ing 在 Storm 中的工作原理.我有 1 个喷嘴和 2 个螺栓链接在一起.Spout 向 Bolt1 发出元组,而 Bolt1 又会向 Bolt 2 发出元组.我希望 Bolt 2 确认从 Spout 发送的初始元组,但我不确定如何.

Just want to make sure I got how Ack-ing works in Storm. I have 1 spout and 2 bolts chained together. Spout emits tuple to Bolt1 which in turn will emit a tuple to Bolt 2. I want Bolt 2 to ack the initial tuple sent from Spout and I'm not sure how.

为了保证容错性(即:元组被重新发送),我想在螺栓 2 中确认 Spout 发出的元组,以防万一它在过程中的某个地方失败,以便可以重新发送.

In order to guarantee fault tolerance (ie: tuples are resent) I want to ack in bolt 2 the tuple emitted by Spout just in case it fails somewhere in the process so it can be resent.

考虑这个例子:

喷口:

 _collector.emit(new Values(queue.dequeue())

螺栓 1:

def execute(tuple: Tuple) {
 _collector.emit(tuple, new Values("stuff"))
}

此时 tuple 是 spout 发送的 tuple.我可以在这里确认它没有问题.现在添加另一个螺栓,它侦听 Bolt1 发出的元组.

At this point tuple is the tuple sent by the spout. I can ack it here w no probs. Now add another bolt which listens in on tuples emitted by Bolt1.

螺栓 2:

def execute(tuple2: Tuple) {
 _collector.emit(tuple2, new Values("foo"))
}

此时 tuple2 中的元组是从 Bolt1 发送的元组(其中包含字符串stuff"的元组).
因此,如果我在 Bolt2 中发送一个 ack,这将确认来自 Bolt1 的元组,而不是从 Spout 发送的元组.正确的?

At this point the tuple in tuple2 is the tuple sent from Bolt1 (the one that has string "stuff" in it).
So if I send an ack in Bolt2 this will ack the tuple from Bolt1 not the one sent from Spout. Correct?

如何确认从 spout 发送的元组?我是否应该在所有其他喷口上搭载初始喷口,以便我可以在最后一个 Bolt 中检索它并确认它?

How can I ack the tuple that was sent from the spout? Should I piggy back the initial spout on all the other spouts so I can retrieve it in the last Bolt and ack it?

我阅读了 Nathan 的教程,我的印象是我可以在发出 tuple2 后立即确认在 Bolt1(来自 Spout)中收到的元组.这会将新发出的 tuple2 链接到 Spout 发送的原始元组,因此当 Bolt2 确认元组 2 时,它实际上确认了来自 Spout 的原始元组.这是真的?

I read Nathan's tutorials and I got the impression that I could ack the tuple received in Bolt1 (from Spout) right there after emitting tuple2. This would link the newly emitted tuple2 to the original tuple sent by Spout so when Bolt2 acks tuple 2 it actually acks the original tuple from the Spout. Is this true?

如果我在解释中遗漏了什么,请告诉我.

Let me know if I'm missing something in my explanation.

推荐答案

对于那些有兴趣的人,我通过在 Storm 小组中询问找到了解决方案.我需要的是在 Spout 中以下列方式发出元组(具有唯一 ID):

For those interested, I've found a solution by asking on the storm group. What I need is in Spout to emit tuples the following way (with a unique ID):

喷口:

 //ties in tuple to this UID
 _collector.emit(new Values(queue.dequeue(), *uniqueID*) 

然后 Bolt1 只有在将元组发送给 Bolt2 后才会确认它

Then Bolt1 will ack the tuple only after it emits it to Bolt2

螺栓 1:

 //emit first then ack
 _collector.emit(tuple, new Values("stuff")) //**anchoring** - read below to see what it means
 _collector.ack(tuple) 

此时来自 Spout 的元组已经在 Bolt1 中被确认,但同时新发出的元组东西"到 Bolt2 被锚定"到来自 Spout 的元组.这意味着它仍然需要稍后确认,否则超时时它将被 spout 重新发送.

At this point tuple from Spout has been acked in Bolt1, but at the same time the newly emitted tuple "stuff" to Bolt2 is "anchored" to the tuple from Spout. What this means is that it still needs to be acked later on otherwise on timeout it will be resent by spout.

螺栓 2:

 _collector.ack(tuple) 

Bolt2 需要确认从 Bolt1 收到的元组,它将发送 Spout 等待的最后一个确认.如果此时 Bolt2 发出元组,那么必须有一个 Bolt3 来获取它并确认它.如果元组在最后一点没有被确认,Spout 将超时并重新发送它.

Bolt2 needs to ack the tuple received from Bolt1 which will send in the last ack that Spout was waiting for. If at this point Bolt2 emits tuple, then there must be a Bolt3 which will get it and ack it. If the tuple is not acked at the last point, Spout will time it out and resend it.

每次锚定在 emit 语句上从bolt 到bolt 完成时,都会构建一个树"结构中的新节点......在我的例子中更像是一个列表,因为我从来没有发送同一个元组到 2 个或更多元组,我有 1 对 1 的关系.

Each time anchoring is done on an emit statement from bolt to bolt, a new node in a "tree" structure is built... well more like a list in my case since I never send the same tuple to 2 or more tuples, I have a 1 to 1 relationship.

树中的所有节点都需要被确认,只有这样元组才被标记为完全到达.如果元组没有被确认并且它与 UID 一起发送并稍后锚定,那么它将永远保存在内存中(直到被确认).

All nodes in the tree need to be acked, and only then the tuple is marked as fully arrived. If the tuple is not acked and it is sent with a UID and anchored later on then it will be kept in memory forever (until acked).

希望这会有所帮助.

这篇关于在 Storm 中通过螺栓链确认的正确方法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆