不清楚Kafka中auto.offset.reset和enable.auto.commit的含义 [英] Not clear about the meaning of auto.offset.reset and enable.auto.commit in Kafka

查看:40
本文介绍了不清楚Kafka中auto.offset.reset和enable.auto.commit的含义的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我是 Kafka 的新手,我不是很了解 Kafka 配置的含义,谁能解释得更通俗易懂!

I am new to Kafka,and I don't really understand the meaning of Kafka configuration, can anyone explain more understandable to me !

这是我的代码:

 val kafkaParams = Map[String, Object](
  "bootstrap.servers" -> "master:9092,slave1:9092",
  "key.deserializer" -> classOf[StringDeserializer],
  "value.deserializer" -> classOf[StringDeserializer],
  "group.id" -> "GROUP_2017",
  "auto.offset.reset" -> "latest", //earliest or latest
  "enable.auto.commit" -> (true: java.lang.Boolean)
)

它在我的代码中是什么意思?

what does it mean in my code?

推荐答案

我会解释给你的意思,但我强烈建议阅读 Kafka 网站配置

I will explain to you the meaning, but I highly suggest to read Kafka Web Site Configuration

"bootstrap.servers" -> "master:9092,slave1:9092"

本质上是 Kafka 集群配置:IP 和端口.

Essentially the Kafka cluster configuration: IP and Port.

 "key.deserializer" -> classOf[StringDeserializer]
 "value.deserializer" -> classOf[StringDeserializer]

这个 SO 回答解释了什么是目的.

This SO answer explain what is the purpose.

"group.id" -> "GROUP_2017"

一个消费者进程将属于一个 groupId.一个 groupId 可以有多个 Consumer,Kafka 只会将一个 Consumer 进程分配给一个 Partition(用于数据消费).如果消费者数量大于可用分区,那么一些进程将处于空闲状态.

A consumer process will belong to a groupId. A groupId can have multiple Consumers and Kafka will assign only one Consumer process to only one Partition (for data consuming). If the number of consumers is greater than the partitions available, then some processes will be idle.

"enable.auto.commit" -> (true: java.lang.Boolean)

如果该标志为真,那么 Kafka 能够提交您使用 Zookeeper 从 Kafka 带来的消息,以保留它读取的最后一个偏移量".当您想要为生产系统提供更健壮的解决方案时,这种方法不是最好的使用方法,因为不能确保正确处理您带来的记录(使用您在代码中编写的逻辑).如果此标志为 false,Kafka 将不知道最后读取的偏移量是哪个,因此当您重新启动进程时,它将开始读取最早"或最新"偏移量,具体取决于您的下一个标志(auto.offset.offset)的值.重启).最后,这个Cloudera文章详细解释了如何以适当的方式管理偏移量.

Wether that flag is true, then Kafka is able to commit the message you brought from Kafka using Zookeeper to persist the last 'offset' which it read. This approach is not the best to use when you want a more robust solution for a production system, because does not ensure that the records you brought were correctly processed (using the logic you wrote in your code). If this flag is false, Kafka will not know which was the last offset read so when you restart the process, it will start reading the 'earliest' or the 'latest' offset depending on the value of your next flag (auto.offset.reset). Finally, This Cloudera article explains in details how to manage in a proper way the offsets.

"auto.offset.reset" -> "latest"

这个标志告诉 Kafka 从哪里开始读取偏移量,以防你还没有任何提交".换句话说,如果您尚未在 Zookeeper 中保留任何偏移量(手动或使用 enable.auto.commit 标志),它将从最早"或最新"开始.

This flag tells Kafka where to start reading offsets in case you do not have any 'commit' yet. In others words, it will start either from the 'earliest' or from the 'latest' if you have not persisted any offset in Zookeeper yet (Manually or using enable.auto.commit flag).

这篇关于不清楚Kafka中auto.offset.reset和enable.auto.commit的含义的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆