在Kafka中不清楚auto.offset.reset和enable.auto.commit的含义 [英] Not clear about the meaning of auto.offset.reset and enable.auto.commit in Kafka

查看:272
本文介绍了在Kafka中不清楚auto.offset.reset和enable.auto.commit的含义的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我是Kafka的新手,我不太了解Kafka的配置含义,谁能对我说得更明白些!

I am new to Kafka,and I don't really understand the meaning of Kafka configuration, can anyone explain more understandable to me !

这是我的代码:

 val kafkaParams = Map[String, Object](
  "bootstrap.servers" -> "master:9092,slave1:9092",
  "key.deserializer" -> classOf[StringDeserializer],
  "value.deserializer" -> classOf[StringDeserializer],
  "group.id" -> "GROUP_2017",
  "auto.offset.reset" -> "latest", //earliest or latest
  "enable.auto.commit" -> (true: java.lang.Boolean)
)

在我的代码中是什么意思?

what does it mean in my code?

推荐答案

我将向您解释其含义,但我强烈建议您阅读

I will explain to you the meaning, but I highly suggest to read Kafka Web Site Configuration

"bootstrap.servers" -> "master:9092,slave1:9092"

基本上是Kafka群集配置:IP和端口.

Essentially the Kafka cluster configuration: IP and Port.

 "key.deserializer" -> classOf[StringDeserializer]
 "value.deserializer" -> classOf[StringDeserializer]

此SO 答案说明了什么是目的.

This SO answer explain what is the purpose.

"group.id" -> "GROUP_2017"

使用者进程将属于groupId.一个groupId可以有多个使用者,而Kafka只会将一个使用者进程分配给一个分区(用于数据消耗).如果使用者数量大于可用分区,则某些进程将处于空闲状态.

A consumer process will belong to a groupId. A groupId can have multiple Consumers and Kafka will assign only one Consumer process to only one Partition (for data consuming). If the number of consumers is greater than the partitions available, then some processes will be idle.

"enable.auto.commit" -> (true: java.lang.Boolean)

如果该标志为true,则Kafka能够使用Zookeeper提交您从Kafka带来的消息,以保留其读取的最后一个偏移".当您要为生产系统提供更强大的解决方案时,这种方法不是最佳方法,因为它不能确保正确处理所带来的记录(使用在代码中编写的逻辑).如果此标志为假,则Kafka将不知道哪个是最近读取的偏移量,因此,当您重新启动该过程时,它将开始读取最早的"或最新的"偏移量,具体取决于下一个标记(auto.offset)的值.重置).最后,此Cloudera文章详细说明了如何以正确的方式管理补偿.

Wether that flag is true, then Kafka is able to commit the message you brought from Kafka using Zookeeper to persist the last 'offset' which it read. This approach is not the best to use when you want a more robust solution for a production system, because does not ensure that the records you brought were correctly processed (using the logic you wrote in your code). If this flag is false, Kafka will not know which was the last offset read so when you restart the process, it will start reading the 'earliest' or the 'latest' offset depending on the value of your next flag (auto.offset.reset). Finally, This Cloudera article explains in details how to manage in a proper way the offsets.

"auto.offset.reset" -> "latest"

如果您还没有任何提交",该标志告诉Kafka从哪里开始读取偏移量.换句话说,如果您尚未在Zookeeper中保留任何偏移量(手动或使用enable.auto.commit标志),它将从最早的"或最新的"开始.

This flag tells Kafka where to start reading offsets in case you do not have any 'commit' yet. In others words, it will start either from the 'earliest' or from the 'latest' if you have not persisted any offset in Zookeeper yet (Manually or using enable.auto.commit flag).

这篇关于在Kafka中不清楚auto.offset.reset和enable.auto.commit的含义的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆