kafka-python使用者开始从偏移量开始读取(自动) [英] kafka-python consumer start reading from offset (automatically)

查看：1457 发布时间：2020/4/25 8:29:25 python-3.x apache-kafka offset kafka-consumer-api kafka-python

本文介绍了kafka-python使用者开始从偏移量开始读取(自动)的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在尝试使用kafka-python构建应用程序，其中消费者从一系列主题中读取数据.消费者永远不要两次阅读同一条消息，也永远不会错过任何一条消息，这一点非常重要.

I'm trying to build an application with kafka-python where a consumer reads data from a range of topics. It is extremely important that the consumer never reads the same message twice, but also never misses a message.

除了我关闭使用者(例如失败)并尝试从偏移量开始读取之外，其他所有内容似乎都可以正常工作.我只能阅读主题中的所有消息(这会造成重复阅读)，或者仅听新消息(并且会漏掉细分期间发出的消息).暂停消费者时，我不会遇到此问题.

Everything seems to be working fine, except when I turn off the consumer (e.g. failure) and try to start reading from offset. I can only read all the messages from the topic (which creates double reads) or listen for new messages only (and miss messages that where emitted during the breakdown). I don't encounter this problem when pausing the consumer.

我创建了一个隔离的模拟以尝试解决该问题.

I created an isolated simulation in order to try to solve the problem.

这里是通用生产者:

from time import sleep
from json import dumps
from kafka import KafkaProducer

producer = KafkaProducer(bootstrap_servers=['localhost:9092'])

x=0 # set manually to avoid duplicates 

for e in range(1000):
    if e <= x:
        pass
    else:
        data = dumps(
            {
            'number' : e
        }
        ).encode('utf-8')

        producer.send('numtest', value=data)
        print(e, ' send.')

        sleep(5)

和消费者.如果auto_offset_reset设置为'earliest'，则将再次读取所有消息.如果auto_offset_reset设置为'latest'，则在停机期间将不会读取任何消息.

And the consumer. If auto_offset_reset is set to 'earliest', all the messages will be read again. If auto_offset_reset is set to 'latest', no messages during down-time will be read.

from kafka import KafkaConsumer
from pymongo import MongoClient
from json import loads

## Retrieve data from kafka (WHAT ABOUT MISSED MESSAGES?)
consumer = KafkaConsumer('numtest', bootstrap_servers=['localhost:9092'],
                         auto_offset_reset='earliest', enable_auto_commit=True,
                         auto_commit_interval_ms=1000)


## Connect to database
client = MongoClient('localhost:27017')
collection = client.counttest.counttest

# Send data
for message in consumer:
    message = loads(message.value.decode('utf-8'))
    collection.insert_one(message)
    print('{} added to {}'.format(message, collection))

我觉得自动提交无法正常工作.

I feel like the auto-commit isn't working properly.

我知道这个问题类似于这个问题，但是我想要一个具体的解决方案.

I know that this questions is similar to this one, but I would like a specific solution.

感谢您的帮助.

推荐答案

由于您的消费者未使用消费者组，因此您将收到此行为.在消费者组中，消费者将定期向Kafka承诺(保存)自己的立场.这样，如果重新启动，它将从其上次提交的位置开始拾取.

You are getting this behavior because your consumer is not using a Consumer Group. With a Consumer Group, the consumer will regularly commit (save) its position to Kafka. That way if it's restarted it will pick up from its last committed position.

要使消费者使用消费者组，需要在构造group_id时将其设置. 请参见文档

To make your consumer use a Consumer Group, you need to set group_id when constructing it. See group_id description from the docs:

要加入动态分区的使用者组的名称分配(如果启用)，并用于获取和提交抵消.如果为None，则自动分区分配(通过组协调器) 和偏移量提交被禁用.默认值:无

The name of the consumer group to join for dynamic partition assignment (if enabled), and to use for fetching and committing offsets. If None, auto-partition assignment (via group coordinator) and offset commits are disabled. Default: None

例如:

consumer = KafkaConsumer('numtest', bootstrap_servers=['localhost:9092'],
                         auto_offset_reset='earliest', enable_auto_commit=True,
                         auto_commit_interval_ms=1000, group_id='my-group')

这篇关于kafka-python使用者开始从偏移量开始读取(自动)的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

kafka-python使用者开始从偏移量开始读取(自动) [英] kafka-python consumer start reading from offset (automatically)

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

kafka-python使用者开始从偏移量开始读取(自动) [英] kafka-python consumer start reading from offset (automatically)

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭