频繁的“偏移超出范围"消息,消费者遗弃的分区 [英] Frequent "offset out of range" messages, partitions deserted by consumer

查看:72
本文介绍了频繁的“偏移超出范围"消息,消费者遗弃的分区的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我们正在运行3节点Kafka 0.10.0.1集群.我们有一个消费者应用程序,它具有连接到多个主题的单个消费者组.我们在消费者日志中看到了奇怪的行为.有了这些行

We are running 3 node Kafka 0.10.0.1 cluster. We have a consumer application which has a single consumer group connecting to multiple topics. We are seeing strange behaviour in consumer logs. With these lines

 Fetch offset 1109143 is out of range for partition email-4, resetting offset
 Fetch offset 952168 is out of range for partition email-7, resetting offset
 Fetch offset 945796 is out of range for partition email-5, resetting offset
 Fetch offset 950900 is out of range for partition email-0, resetting offset
 Fetch offset 953163 is out of range for partition email-3, resetting offset
 Fetch offset 1118389 is out of range for partition email-6, resetting offset
 Fetch offset 1112177 is out of range for partition email-2, resetting offset
 Fetch offset 1109539 is out of range for partition email-1, resetting offset

一段时间后,我们看到了这些日志

Some time later we saw these logs

[2018-06-08 19:45:28] :: INFO  :: ConsumerCoordinator:333 - Revoking previously assigned partitions [sms-4, sms-3, sms-0, sms-2, sms-1] for group notifications-consumer
[2018-06-08 19:45:28] :: INFO  :: AbstractCoordinator:381 - (Re-)joining group notifications-consumer
[2018-06-08 19:45:28] :: INFO  :: AbstractCoordinator$1:349 - Successfully joined group notifications-consumer with generation 3063
[2018-06-08 19:45:28] :: INFO  :: AbstractCoordinator$1:349 - Successfully joined group notifications-consumer with generation 3063
[2018-06-08 19:45:28] :: INFO  :: AbstractCoordinator$1:349 - Successfully joined group notifications-consumer with generation 3063
[2018-06-08 19:45:28] :: INFO  :: AbstractCoordinator$1:349 - Successfully joined group notifications-consumer with generation 3063
[2018-06-08 19:45:28] :: INFO  :: AbstractCoordinator$1:349 - Successfully joined group notifications-consumer with generation 3063
[2018-06-08 19:45:28] :: INFO  :: AbstractCoordinator$1:349 - Successfully joined group notifications-consumer with generation 3063
[2018-06-08 19:45:28] :: INFO  :: AbstractCoordinator$1:349 - Successfully joined group notifications-consumer with generation 3063
[2018-06-08 19:45:28] :: INFO  :: AbstractCoordinator$1:349 - Successfully joined group notifications-consumer with generation 3063
[2018-06-08 19:45:28] :: INFO  :: ConsumerCoordinator:225 - Setting newly assigned partitions [sms-8, sms-7, sms-9, sms-6, sms-5] for group notifications-consumer

我注意到设置新分配的分区"列表中未显示我们的主题之一.然后,该主题至少有8个小时没有吸引任何消费者.只有当有人重新启动应用程序时,它才从该主题开始使用.这里可能出什么问题了?

I noticed that one of our topics was not seen in the list of Setting newly assigned partitions. Then that topic had no consumers attached to it for 8 hours at least. It's only when someone restarted application it started consuming from that topic. What can be going wrong here?

这是使用者配置

auto.commit.interval.ms = 3000
auto.offset.reset = latest
bootstrap.servers = [x.x.x.x:9092, x.x.x.x:9092, x.x.x.x:9092]
check.crcs = true
client.id =
connections.max.idle.ms = 540000
enable.auto.commit = true
exclude.internal.topics = true
fetch.max.bytes = 52428800
fetch.max.wait.ms = 500
fetch.min.bytes = 1
group.id = otp-notifications-consumer
heartbeat.interval.ms = 3000
interceptor.classes = null
key.deserializer = class org.apache.kafka.common.serialization.StringDeserializer
max.partition.fetch.bytes = 1048576
max.poll.interval.ms = 300000
max.poll.records = 50
metadata.max.age.ms = 300000
metric.reporters = []
metrics.num.samples = 2
metrics.sample.window.ms = 30000
partition.assignment.strategy = [class org.apache.kafka.clients.consumer.RangeAssignor]
receive.buffer.bytes = 65536
reconnect.backoff.ms = 50
request.timeout.ms = 305000
retry.backoff.ms = 100
sasl.kerberos.kinit.cmd = /usr/bin/kinit
sasl.kerberos.min.time.before.relogin = 60000
sasl.kerberos.service.name = null
sasl.kerberos.ticket.renew.jitter = 0.05
sasl.kerberos.ticket.renew.window.factor = 0.8
sasl.mechanism = GSSAPI
security.protocol = SSL
send.buffer.bytes = 131072
session.timeout.ms = 300000
ssl.cipher.suites = null
ssl.enabled.protocols = [TLSv1.2, TLSv1.1, TLSv1]
ssl.endpoint.identification.algorithm = null
ssl.key.password = null
ssl.keymanager.algorithm = SunX509
ssl.keystore.location = null
ssl.keystore.password = null
ssl.keystore.type = JKS
ssl.protocol = TLS
ssl.provider = null
ssl.secure.random.implementation = null
ssl.trustmanager.algorithm = PKIX
ssl.truststore.location = /x/x/client.truststore.jks
ssl.truststore.password = [hidden]
ssl.truststore.type = JKS
value.deserializer = class org.apache.kafka.common.serialization.StringDeserializer

成为孤儿的主题有10个分区,retention.ms = 1800000,segment.ms = 1800000.请帮忙.

The topic which went orphan has 10 partitions, retention.ms=1800000, segment.ms=1800000. Please help.

推荐答案

您看到的偏移超出范围"消息通常表示在代理上已删除了使用者所在的偏移.一旦发现消费者将使用 auto.offset.reset 重新开始消费.

The offset out of range message you are seeing usually indicates the offset the consumer is at has been deleted on the broker. Upon hitting that the consumer will use auto.offset.reset to restart consuming.

使用 retention.ms = 1800000 (30分钟),您只将数据保留了很短的时间,因此,如果您在几个小时后重新启动使用者,则预计数据将消失.

With retention.ms=1800000 (30mins), you are only keeping data for a very short amount of time so it's expected that if you restart the consumer after several hours, the data is gone.

这篇关于频繁的“偏移超出范围"消息,消费者遗弃的分区的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆