我们如何重置与 Kafka Connect 源连接器关联的状态? [英] How do we reset the state associated with a Kafka Connect source connector?

查看:23
本文介绍了我们如何重置与 Kafka Connect 源连接器关联的状态?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我们正在使用 Kafka Connect 2.5.

We are working with Kafka Connect 2.5.

我们正在使用 Confluent JDBC 源连接器(虽然我认为这个问题主要与连接器类型无关)并且正在将 IBM DB2 数据库中的一些数据用于主题,使用递增模式"(主键)作为每条记录的唯一 ID.

We are using the Confluent JDBC source connector (although I think this question is mostly agnostic to the connector type) and are consuming some data from an IBM DB2 database onto a topic, using 'incrementing mode' (primary keys) as unique IDs for each record.

这在正常的事件过程中工作正常;连接器第一次启动时,所有记录都会被消耗并放置在一个主题上,然后,当添加新记录时,它们会被添加到我们的主题中.在我们的开发环境中,当我们更改连接器参数等时,我们希望按需有效地重置连接器;即让它再次从表的开始"消耗数据.

That works fine in the normal course of events; the first time the connector starts all records are consumed and placed on a topic, then, when new records are added, they are added to our topic. In our development environment, when we change connector parameters etc., we want to effectively reset the connector on-demand; i.e. have it consume data from the "beginning" of the table again.

我们认为删除连接器(使用Kafka Connect REST API) 会这样做 - 并且会产生从 Kafka Connect connect-* 元数据中删除有关该连接器配置的所有信息的副作用主题也一样.

We thought that deleting the connector (using the Kafka Connect REST API) would do this - and would have the side-effect of deleting all information regarding that connector configuration from the Kafka Connect connect-* metadata topics too.

然而,这似乎不是发生的事情.元数据保留在这些主题中,当我们重新创建/重新添加连接器配置(再次使用 REST API)时,它会记住"表中的偏移量.这似乎令人困惑且无益 - 删除连接器并不会删除其状态.有没有办法更永久地擦除连接器和/或重置其消费位置,而不是拉低整个 Kafka Connect 环境,这似乎很激烈?理想情况下,我们不想直接干预内部主题.

However, this doesn’t appear to be what happens. The metadata remains in those topics, and when we recreate/re-add the connector configuration (again using the REST API), it 'remembers' the offset it was consuming from in the table. This seems confusing and unhelpful - deleting the connector doesn’t delete its state. Is there a way to more permanently wipe the connector and/or reset its consumption position, short of pulling down the whole Kafka Connect environment, which seems drastic? Ideally we’d like not to have to meddle with the internal topics directly.

推荐答案

这个问题的部分答案:这似乎是我们看到的行为 是预料之中的:

Partial answer to this question: it seems the behaviour we are seeing is to be expected:

如果您使用增量摄取,Kafka Connect 的偏移量是多少有储存吗?如果您删除并重新创建具有相同name,将保留与前一个实例的偏移量.考虑创建连接器的场景.它成功摄取源中最多给定 ID 或时间戳值的所有数据表,然后删除并重新创建它.新版本的连接器将从以前的版本获得偏移量,因此只有摄取比先前处理的数据更新的数据.你可以通过查看 offset.storage.topic 和值来验证这一点为相关表存储在其中.

If you’re using incremental ingest, what offset does Kafka Connect have stored? If you delete and recreate a connector with the same name, the offset from the previous instance will be preserved. Consider the scenario in which you create a connector. It successfully ingests all data up to a given ID or timestamp value in the source table, and then you delete and recreate it. The new version of the connector will get the offset from the previous version and thus only ingest newer data than that which was previously processed. You can verify this by looking at the offset.storage.topic and the values stored in it for the table in question.

至少对于 Confluent JDBC 连接器,有一个 重置指针的解决方法.

At least for the Confluent JDBC connector, there is a workaround to reset the pointer.

就我个人而言,我仍然很困惑为什么 Kafka Connect 在连接器被删除时会保留它的状态,但似乎这是设计行为.如果有更好(且受支持)的方法来移除该状态,我仍然会感兴趣.

Personally, I'm still confused why Kafka Connect retains state for the connector at all when it's deleted, but seems that is designed behaviour. Would still be interested if there is a better (and supported) way to remove that state.

另一篇相关博客文章:https:///rmoff.net/2019/08/15/reset-kafka-connect-source-connector-offsets/

这篇关于我们如何重置与 Kafka Connect 源连接器关联的状态?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆