为什么Dataflow-BigTable连接器不支持增量? [英] Why increments are not supported in Dataflow-BigTable connector?

查看:76
本文介绍了为什么Dataflow-BigTable连接器不支持增量?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在Streaming模式下有一个用例,我们想跟踪需要增加操作的管道中BigTable上的计数器(某些项目已完成处理)。通过查看 https://cloud.google.com/bigtable/docs/dataflow-hbase ,我看到此客户端不支持HBase API的附加/递增操作。陈述的原因是批处理模式下的重试逻辑,但是如果Dataflow保证完全一次,那么为什么支持它不是一个好主意,因为我确定增量被称为一次。我想了解我缺少的部分。

We have a use case in the Streaming mode where we want to keep track of a counter on BigTable from the pipeline (something #items finished processing) for which we need the increment operation. From looking at https://cloud.google.com/bigtable/docs/dataflow-hbase, I see that append/increment operations of the HBase API are not supported by this client. The reason stated is the retry logic on batch mode but if Dataflow guarantees exactly-once, why would supporting it be a bad idea since I know for sure the increment was called only-once? I want to understand what part I am missing.

此外, CloudBigTableIO 是否可以在流模式下使用?仅限批处理模式?我想我们可以直接在管道中使用BigTable HBase客户端,但是连接器似乎具有不错的属性,例如我们想利用的Connection-pooling,因此也存在问题。

Also, is CloudBigTableIO usable in Streaming mode or is it tied to Batch mode only? I guess we could use the BigTable HBase client directly in the pipeline but the connector seems to have nice properties like Connection-pooling which we would like to leverage and hence the question.

推荐答案

在出现故障和重试的情况下,Dataflow(和其他系统)提供一次精确执行的方式是通过要求副作用(如突变BigTable)是幂等的。 写是幂等的,因为它在重试时会被覆盖。可以通过包含确定性的插入ID来对插入进行重复处理,确定性的插入ID对插入进行重复数据删除。

The way that Dataflow (and other systems) offer the appearence of exactly-once execution in the presence of failures and retries is by requiring that side-effects (such as mutating BigTable) are idempotent. A "write" is idempotent because it is overwritten on retry. Inserts can be made idempotent by including a deterministic "insert ID" that deduplicates the insert.

对于增量,情况并非如此。不支持它,因为重试时它不会等幂,因此它不支持一次执行。

For an increment, that is not the case. It is not supported because it would not be idempotent when retried, so it would not support exactly-once execution.

这篇关于为什么Dataflow-BigTable连接器不支持增量?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆