为什么 Dataflow-BigTable 连接器不支持增量? [英] Why increments are not supported in Dataflow-BigTable connector?

查看:23
本文介绍了为什么 Dataflow-BigTable 连接器不支持增量?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我们在流模式中有一个用例,我们希望在需要增量操作的管道中跟踪 BigTable 上的计数器(某些 #items 已完成处理).从查看 https://cloud.google.com/bigtable/docs/dataflow-hbase,我看到此客户端不支持 HBase API 的追加/增量操作.陈述的原因是批处理模式下的重试逻辑,但如果 Dataflow 保证恰好一次,为什么支持它是一个坏主意,因为我确定增量只被调用一次?我想了解我遗漏了什么部分.

We have a use case in the Streaming mode where we want to keep track of a counter on BigTable from the pipeline (something #items finished processing) for which we need the increment operation. From looking at https://cloud.google.com/bigtable/docs/dataflow-hbase, I see that append/increment operations of the HBase API are not supported by this client. The reason stated is the retry logic on batch mode but if Dataflow guarantees exactly-once, why would supporting it be a bad idea since I know for sure the increment was called only-once? I want to understand what part I am missing.

此外,CloudBigTableIO 是否可用于流式模式还是仅与批处理模式相关联?我想我们可以直接在管道中使用 BigTable HBase 客户端,但连接器似乎有很好的属性,比如我们想要利用的连接池,因此问题.

Also, is CloudBigTableIO usable in Streaming mode or is it tied to Batch mode only? I guess we could use the BigTable HBase client directly in the pipeline but the connector seems to have nice properties like Connection-pooling which we would like to leverage and hence the question.

推荐答案

Dataflow(和其他系统)在失败和重试的情况下提供只执行一次的方式是通过要求副作用(例如作为变异 BigTable)是幂等的.写入"是幂等的,因为它在重试时被覆盖.通过包含对插入进行重复数据删除的确定性插入 ID",可以使插入成为幂等的.

The way that Dataflow (and other systems) offer the appearence of exactly-once execution in the presence of failures and retries is by requiring that side-effects (such as mutating BigTable) are idempotent. A "write" is idempotent because it is overwritten on retry. Inserts can be made idempotent by including a deterministic "insert ID" that deduplicates the insert.

对于增量,情况并非如此.不支持,因为它在重试时不是幂等的,所以它不支持只执行一次.

For an increment, that is not the case. It is not supported because it would not be idempotent when retried, so it would not support exactly-once execution.

这篇关于为什么 Dataflow-BigTable 连接器不支持增量?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆