有状态DoFn可以具有以TTL到期的状态吗?还是无限增长可以吗? [英] Can a stateful DoFn have state that expires with a TTL? Or is unbounded growth OK?

查看:71
本文介绍了有状态DoFn可以具有以TTL到期的状态吗?还是无限增长可以吗?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在Apache Beam(在Dataflow中运行)中遇到了一种情况,其中我根据

I have a situation in Apache Beam (running in Dataflow) where I have created a simple stateful DoFn, based on this article. The upstream window is global, and changing it would impact downstream aggregations.

当前,我没有采取任何措施来缩小状态,并且状态似乎会无限增长.这是真的?无限的国家成长是一个问题吗?

Currently, I am not doing anything to shrink the state, and it would appear to just grow unbounded. Is this true? Is unbounded state growth a problem?

我想简单地将TTL附加到状态,但看不到此功能.

I would like to simply attach a TTL to the state, but don't see this functionality.

我正在考虑将自己的时间戳存储在数据上,并使用计时器定期清理表.这是明智的吗?

I am considering storing my own timestamp on the data, and using a timer to cleanup the table periodically. Is this advisable?

正在存储的数据是某些事件数据上的缓存键.缓存键告诉我,我需要查找该事件的过去事件数据以对当前事件进行水化处理.有状态的DoFn为此很好地工作,但是,就像我说过的那样,我担心它会无限制地增长.我不确定在Dataflow中是否有任何后果.

The data that's being stored is a cache key on some evented data. The cache key tells me that I need to lookup a past events data for this event to hydrate the current event. The stateful DoFn works well for this, yet, like I said I am concerned it will grow unbounded. I'm unsure if there's any consequences of that in Dataflow.

推荐答案

当窗口过期时,将自动对状态进行垃圾回收.由于您正在使用全局窗口,因此它将永远不会过期.因此,您将需要使用计时器自己进行管理.

State is automatically garbage collected when a window expired. Since you are using the global window, it will never expire. So you will need to manage this yourself with timers.

我不知道您的代码的详细信息,但您的想法听起来很正确:

I don't know the details of your code but your idea sounds about right:

  • 存储带有您状态的时间戳记,以便您知道它的年龄
  • 设置一个事件计时器,该计时器定期重复:
    • 清理表中早于TTL的内容
    • @OnTimer 方法可以重置相同的计时器
    • store a timestamp with your state so you know how old it is
    • set an event time timer that repeats periodically:
      • clean up things in the table older than TTL
      • the @OnTimer method can reset the same timer

      您还可以直接为元素的TTL设置一个计时器,但这将导致更多计时器触发.因此,只有在音量低的情况下才是好的.(但是如果交易量很低,您可能不必太担心无限增长)

      You could also directly set a timer for the TTL for an element, but that will cause many more timers to fire. So would only be good if volume is low. (but if volume is low you probably don't have to worry about unbounded growth so much)

      这篇关于有状态DoFn可以具有以TTL到期的状态吗?还是无限增长可以吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆