使用PubSubIO +消息保证关闭和更新Google Dataflow中的作业 [英] shutdown and update job in Google Dataflow with PubSubIO + message guarantees

查看:73
本文介绍了使用PubSubIO +消息保证关闭和更新Google Dataflow中的作业的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我一直在浏览google dataflow的源代码和文档,但没有看到关于PubSubIO.Read的消息传递语义的任何提及.

我要理解的问题是:PubSubIO和Google Dataflow提供什么样的消息传递语义?根据我对源代码的阅读,使用ProcessingContext#output方法在发出消息之前,先对其进行确认.这意味着Dataflow流作业将丢失已确认但未传递的消息.

因此,在失败和重新部署作业的情况下,Dataflow如何保证(如果有的话)围绕Windows(尤其是会话)的正确性.

解决方案

数据流在将消息持久存储在管道中的中间存储中之前(如果没有GroupByKey则发送到接收器)不会将它们确认到Pub/Sub在管道内).我们还会在短时间内对从发布/订阅中读取的邮件进行重复数据删除,以防止由于错过的邮件而重复发送邮件.因此,Dataflow保证一次交付,对发布者在截然不同的时间插入的任何重复进行取模.

更新.流传输管道不会失败-而是继续重试带有错误的元素.错误可能是暂时性的,并且最终将成功处理该元素,或者在出现一致的异常(代码中为NullPointerException等)的情况下,您可以使用已纠正的代码更新作业,该代码将用于处理失败的元素.

(请注意,DirectRunner的实现有所不同,如果您查看代码的那部分,可能会造成混乱.)

I have been looking through the source and documentation for google dataflow and I didn't see any mention of the message delivery semantics around PubSubIO.Read.

The problem I am trying to understand is: What kind of message delivery semantics does the PubSubIO and Google Dataflow provide? Based on my reading of the source, the messages get acked before they are emitted using ProcessingContext#output method. This implies that the Dataflow streaming job will loose messages that have been acked and not passed on.

So, how does Dataflow guarantee (if at all) correctness around windows (especially session), etc in case of failure and redeploy of jobs.

解决方案

Dataflow doesn't ack messages to Pub/Sub until they have been persisted in intermediate storage within the pipeline (or emitted to the sink, if there is no GroupByKey within the pipeline). We also do deduping of messages read from Pub/Sub for a short period to prevent duplicate delivery from missed acks. So Dataflow guarantees exactly once delivery, modulo any duplicates inserted by publishers at drastically different times.

Any intermediate state buffered within a running pipeline is maintained when the pipeline is updated. Streaming pipelines do not fail -- instead they continue to retry elements with errors. Either the error is transient and the element will eventually be processed successfully, or in the case of a consistent exception (NullPointerException in your code, etc) you can update the job with corrected code that will be used to process the failing element.

(Note that the implementation is different for the DirectRunner, which may be causing confusion if you are looking at that part of the code.)

这篇关于使用PubSubIO +消息保证关闭和更新Google Dataflow中的作业的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆