使用状态和计时器进行处理 [英] Processing with State and Timers

查看:89
本文介绍了使用状态和计时器进行处理的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在Beam Dataflow运行程序(自v2.1.0起)中使用状态处理和计时器是否有任何准则或限制?诸如状态大小限制或更新频率等限制?候选流传输管道将广泛使用状态和计时器作为用户会话状态,而Bigtable作为持久存储.

Are there any guidelines or limitations for using stateful processing and timers with the Beam Dataflow runner (as of v2.1.0)? Things such as limitations on the size of state or frequency of updates etc.? The candidate streaming pipeline would use state and timers extensively for user session state, with Bigtable as durable storage.

推荐答案

以下是针对您的用例的一些常规建议

Here is some general advice for your use case

  • 请聚合多个元素,然后设置一个计时器.
  • 请不要为每个元素创建一个计时器.
  • 尝试并汇总状态,而不是累积大量状态. IE.聚合为总和和计数,而不是在尝试计算平均值时存储每个数字.
  • 在这种情况下,请考虑会话窗口.
  • li>
  • 在数据流中,合并窗口不支持状态.这是光束.
  • 请根据您的访问模式使用状态,即
  • Please aggregate multiple elements then set a timer.
  • Please don't create a timer per element, which would be excessive.
  • Try and aggregate state, instead of accumulating large amount of state. I.e. aggregate as a sum and count, instead of storing every number when trying to compute a mean.
  • Please consider session windows for this use case.
  • In dataflow, state is not supported for merging windows. It is for beam.
  • Please use state according to your access pattern, i.e. BagState for blind writes.

这里是内容丰富的博客文章,其中包含有关状态"

Here is an informative blog post with some more info on state "Stateful processing with Apache Beam."

这篇关于使用状态和计时器进行处理的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆