为什么我不应该在 Spout.nextTuple() 中循环或阻塞 [英] Why should I not loop or block in Spout.nextTuple()

查看:37
本文介绍了为什么我不应该在 Spout.nextTuple() 中循环或阻塞的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我看到了许多代码片段,其中在 Spout.nextTuple() 中使用了循环(例如读取整个文件并为每一行发出一个元组):

I saw many code snippets in which a loop was used inside Spout.nextTuple() (for example to read a whole file and emit a tuple for each line):

public void nextTuple() {
    // do other stuff here

    // reader might be BufferedReader that is initialized in open()
    String str;
    while((str = reader.readLine()) != null) {
        _collector.emit(new Values(str));
    }

    // do some more stuff here
}

这段代码看起来很简单,但是有人告诉我应该不要在nextTuple()中循环.问题是为什么?

This code seems to be straight forward, however, I was told that one should not loop inside nextTuple(). The question is why?

推荐答案

当 Spout 被执行时,它会在单个线程中运行.该线程永远"循环并具有多项职责:

When a Spout is executed it runs in a single thread. This thread loops "forever" and has multiple duties:

  1. 调用 Spout.nextTuple()
  2. 检索确认"并处理它们
  3. 检索失败"并处理它们
  4. 超时元组

要发生这种情况,至关重要的是,您不要在 nextTuple() 中永远"停留(即循环或块),而是在向系统发出元组后返回(或只是如果不能发出元组,则返回,但不要阻塞).否则,Spout 无法正常工作.nextTuple() 会被 Storm 循环调用.因此,在 ack/fail 消息被处理等之后,对 nextTuple() 的下一次调用会很快发生.

For this to happen, it is essential, that you do not stay "forever" (ie, loop or block) in nextTuple() but return after emitting a tuple to the system (or just return if no tuple can be emitted, but do not block). Otherwise, the Spout cannot does its work properly. nextTuple() will be called in a loop by Storm. Thus, after ack/fail messages are processed etc. the next call to nextTuple() happens quickly.

因此,在对 nextTuple() 的单个调用中发出多个元组也被认为是不好的做法.只要代码保持在 nextTuple() 中,spout 线程就不能(例如)对传入的 ack 做出反应.这可能会导致不必要的超时,因为无法及时处理 ack.

Therefore, it is also considered bad practice to emit multiple tuples in a single call to nextTuple(). As long as the code stays in nextTuple(), the spout thread cannot (for example) react on incoming acks. This might lead to unnecessary time-outs because acks cannot be processed timely.

最佳实践是为每次调用 nextTuple() 发出一个元组.如果没有可用的元组可以发出,你应该返回(不发出)而不是等到元组可用.

Best practice is to emit a single tuple for each call to nextTuple(). If no tuple is available to be emitted, you should return (without emitting) and not wait until a tuple is available.

这篇关于为什么我不应该在 Spout.nextTuple() 中循环或阻塞的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆