我们如何重置与Kafka Connect源连接器关联的状态? [英] How do we reset the state associated with a Kafka Connect source connector?

查看:132
本文介绍了我们如何重置与Kafka Connect源连接器关联的状态?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我们正在使用 Kafka Connect 2.5.

We are working with Kafka Connect 2.5.

我们正在使用 Confluent JDBC源连接器(尽管我认为这个问题基本上与连接器类型无关,并且正在将IBM DB2数据库中的某些数据消耗到某个主题上,并使用递增模式"(主键)作为每个记录的唯一ID.

We are using the Confluent JDBC source connector (although I think this question is mostly agnostic to the connector type) and are consuming some data from an IBM DB2 database onto a topic, using 'incrementing mode' (primary keys) as unique IDs for each record.

在正常的事件过程中效果很好;连接器第一次启动时,将消耗所有记录并将其放在一个主题上,然后,在添加新记录时,它们将被添加到我们的主题中.在我们的开发环境中,当我们更改连接器参数等时,我们想按需有效地重置;也就是说,让它再次消耗表开始"的数据.

That works fine in the normal course of events; the first time the connector starts all records are consumed and placed on a topic, then, when new records are added, they are added to our topic. In our development environment, when we change connector parameters etc., we want to effectively reset the connector on-demand; i.e. have it consume data from the "beginning" of the table again.

我们认为删除连接器(使用 Kafka Connect REST API )将执行此操作-并产生副作用,即从Kafka Connect connect-* 元数据中删除有关该连接器配置的所有信息主题.

We thought that deleting the connector (using the Kafka Connect REST API) would do this - and would have the side-effect of deleting all information regarding that connector configuration from the Kafka Connect connect-* metadata topics too.

但是,这似乎不会发生.元数据仍保留在这些主题中,当我们重新创建/重新添加连接器配置(再次使用REST API)时,它记住"表中消耗的偏移量.这似乎令人困惑且无济于事-删除连接器不会删除其状态.有没有一种方法可以更持久地擦除连接器和/或重置其使用位置,而又不至于拉低整个Kafka Connect环境,这似乎太过严峻了?理想情况下,我们不想直接干预内部主题.

However, this doesn’t appear to be what happens. The metadata remains in those topics, and when we recreate/re-add the connector configuration (again using the REST API), it 'remembers' the offset it was consuming from in the table. This seems confusing and unhelpful - deleting the connector doesn’t delete its state. Is there a way to more permanently wipe the connector and/or reset its consumption position, short of pulling down the whole Kafka Connect environment, which seems drastic? Ideally we’d like not to have to meddle with the internal topics directly.

推荐答案

此问题的部分答案:似乎我们看到的行为

Partial answer to this question: it seems the behaviour we are seeing is to be expected:

如果您使用的是增量摄取,那么Kafka Connect的偏移量是多少有存储吗?如果您删除并重新创建具有相同连接器的连接器名称,则将保留前一个实例的偏移量.考虑在其中创建连接器的方案.成功了在源中提取直到给定ID或时间戳值的所有数据表,然后删除并重新创建它.新版本连接器将获得与先前版本的偏移量,因此仅吸收比以前处理的数据新的数据.你可以通过查看offset.storage.topic和值来验证这一点存储在该表中.

If you’re using incremental ingest, what offset does Kafka Connect have stored? If you delete and recreate a connector with the same name, the offset from the previous instance will be preserved. Consider the scenario in which you create a connector. It successfully ingests all data up to a given ID or timestamp value in the source table, and then you delete and recreate it. The new version of the connector will get the offset from the previous version and thus only ingest newer data than that which was previously processed. You can verify this by looking at the offset.storage.topic and the values stored in it for the table in question.

至少对于Confluent JDBC连接器,有一个

At least for the Confluent JDBC connector, there is a workaround to reset the pointer.

就我个人而言,我仍然感到困惑,为什么Kafka Connect在删除连接器后仍保留其状态,但似乎是设计行为.如果有更好的(和受支持的)删除状态的方法,仍然会很感兴趣.

Personally, I'm still confused why Kafka Connect retains state for the connector at all when it's deleted, but seems that is designed behaviour. Would still be interested if there is a better (and supported) way to remove that state.

另一篇相关博客文章: https://rmoff.net/2019/08/15/reset-kafka-connect-source-connector-offsets/

Another related blog article: https://rmoff.net/2019/08/15/reset-kafka-connect-source-connector-offsets/

这篇关于我们如何重置与Kafka Connect源连接器关联的状态?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆