为什么我不应该在Spout.nextTuple()中循环或阻止 [英] Why should I not loop or block in Spout.nextTuple()

查看:181
本文介绍了为什么我不应该在Spout.nextTuple()中循环或阻止的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我看到了许多代码片段,其中在Spout.nextTuple()内部使用了循环(例如,读取整个文件并为每行发出一个元组):

I saw many code snippets in which a loop was used inside Spout.nextTuple() (for example to read a whole file and emit a tuple for each line):

public void nextTuple() {
    // do other stuff here

    // reader might be BufferedReader that is initialized in open()
    String str;
    while((str = reader.readLine()) != null) {
        _collector.emit(new Values(str));
    }

    // do some more stuff here
}

这段代码似乎很简单,但是,我被告知应该在nextTuple()内部不要循环.问题是为什么?

This code seems to be straight forward, however, I was told that one should not loop inside nextTuple(). The question is why?

推荐答案

执行Spout时,它在单个线程中运行.该线程永远"循环并具有多种职责:

When a Spout is executed it runs in a single thread. This thread loops "forever" and has multiple duties:

  1. 致电Spout.nextTuple()
  2. 获取"acks"并进行处理
  3. 检索失败"并处理它们
  4. 超时元组
  1. call Spout.nextTuple()
  2. retrieve "acks" and process them
  3. retrieve "fails" and process them
  4. time-out tuples

要做到这一点,至关重要的是,不要在"nextTuple()"中永远保持永久"(即循环或阻塞),而是在将元组发送给系统后返回(或者如果没有元组可以被发送,则仅返回) ,但请勿阻止).否则,Spout无法正常工作. nextTuple()将由Storm循环调用.因此,在处理了确认/失败消息等之后,对nextTuple()的下一次调用很快发生.

For this to happen, it is essential, that you do not stay "forever" (ie, loop or block) in nextTuple() but return after emitting a tuple to the system (or just return if no tuple can be emitted, but do not block). Otherwise, the Spout cannot does its work properly. nextTuple() will be called in a loop by Storm. Thus, after ack/fail messages are processed etc. the next call to nextTuple() happens quickly.

因此,在单个调用nextTuple()的过程中发出多个元组也被认为是不好的做法.只要代码停留在nextTuple()中,喷口线程就无法(例如)对传入的ack做出反应.这可能导致不必要的超时,因为无法及时处理ack.

Therefore, it is also considered bad practice to emit multiple tuples in a single call to nextTuple(). As long as the code stays in nextTuple(), the spout thread cannot (for example) react on incoming acks. This might lead to unnecessary time-outs because acks cannot be processed timely.

最佳做法是为每次调用nextTuple()发出一个元组.如果没有可用的元组,则应返回(不进行发射),而不要等到可用的元组为止.

Best practice is to emit a single tuple for each call to nextTuple(). If no tuple is available to be emitted, you should return (without emitting) and not wait until a tuple is available.

这篇关于为什么我不应该在Spout.nextTuple()中循环或阻止的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆