为什么我的kafka tmp文件夹的大小几乎与磁盘大小相同? [英] Why my kafka tmp folder have almost same size than disk size?

查看:283
本文介绍了为什么我的kafka tmp文件夹的大小几乎与磁盘大小相同?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我以这种形式开发生产kafka环境:3个ZK服务器,3个Kafka代理和2个kafka连接.我将我的tmp文件与我的kafka主文件夹并排放置.我在远程ubuntu环境中运行它,但不在docker中运行.

I develop production kafka environment with this formation: 3 ZK server, 3 Kafka brokers and Two kafka connect. I put my tmp file side-by-side with my kafka main folder. And I run it in remote ubuntu environment but not in docker.

当我进行kafka操作时,遇到错误,通知我的磁盘消耗过多.我检查我的kafka tmp文件夹,其大小大约是磁盘大小的2/3,这将关闭我的kafka群集.

When i operate my kafka operation, i experienced error which inform my disk are consumed too much. I check my kafka tmp folder that the size is about almost 2/3 of my disk size, which turn off my kafka cluster.

我检查了每个kafka log_folder并发现了这一点:

I have inspect for each kafka log_folder and found this:

    1名工人中的
  1. 25名connect_offset,每人@ 21MB
  2. 来自第2个工作人员的
  3. 25 connect_offset2,每人每个@ 21MB
  4. 1名工人中的
  5. 25 connect_status来自每个工人@ 21MB
  6. 25号工人[c3]来自2号工人,每个工人@ 21MB
  7. 两名工人每个人的
  8. 50 __consumer_offset @ 21MB
  9. 每个主题每个主题的偏移量为@ 21Mb,我有2个主题,所以我有6个主题偏移量
  1. 25 connect_offset from workers no.1 @21MB for each one
  2. 25 connect_offset2 from workers no.2 @21MB for each one
  3. 25 connect_status from workers no.1 @21MB for each one
  4. 25 connect_status2 from workers no.2 @21MB for each one
  5. 50 __consumer_offset from both workers @21MB for each one
  6. topics offset @21Mb for each one per topics, which I have 2 topics so I have 6 topics offset

问题是__consumer_offset消耗的磁盘数量大于其他偏移量,而我的kafka_config无法处理它.这是我的kafka_configuration的示例:

The problem is the number of __consumer_offset is consume more disk than the other offset, and my kafka_config cannot handle it. This is the example of my kafka_configuration:

broker.id=101
port=9099
listeners=PLAINTEXT://0.0.0.0:9099
advertised.listeners=PLAINTEXT://127.0.0.1:9099
num.partitions=3
offsets.topic.replication.factor=3
log.dir=/home/xxx/tmp/kafka_log1
log.cleaner.enable=true
log.cleanup.policy=delete
log.retention.bytes=1073741824
log.segment.bytes=1073741824
log.retention.check.interval.ms=60000
message.max.bytes=1073741824
zookeeper.connect=xxx:2185,xxx:2186,xxx:2187
zookeeper.connection.timeout.ms=7200000
session.time.out.ms=30000
delete.topic.enable=true

对于每个主题,这是配置:

And for each topics, this is the config:

kafka-topics.sh -create --zookeeper xxx:2185,xxx:216,xxx:2187 --replication-factor 3 --partitions 3 --topic $topic_name --config cleanup.policy=delete --config retention.ms=86400000 --config min.insync.replicas=2 --config compression.type=gzip

和这样的连接配置(连接配置共享相同的配置,除了端口和偏移量以及状态配置.):

And the connect config like this (connect config share identical config except port and offset and status config.):

bootstrap.servers=XXX:9099,XXX:9098,XXX:9097
group.id=XXX
key.converter.schemas.enable=true
value.converter.schemas.enable=true
key.converter=org.apache.kafka.connect.json.JsonConverter
value.converter=org.apache.kafka.connect.json.JsonConverter
offset.storage.topic=connect-offsets
offset.storage.replication.factor=3
config.storage.topic=connect-configs
config.storage.replication.factor=3
status.storage.topic=connect-status
status.storage.replication.factor=3
offset.flush.timeout.ms=300000
rest.host.name=xxx
rest.port=8090
connector.client.config.override.policy=All
producer.max.request.size=1073741824
producer.ack=all
producer.enable.idempotence=true
consumer.max.partition.fetch.bytes=1073741824
consumer.auto.offset.reset=latest
consumer.enable.auto.commit=true
consumer.max.poll.interval.ms=5000000
plugin.path=/xxx/connectors

很明显,根据一些文档,Kafka不需要大的磁盘空间(记录的最大tmp是36 GB).

It's very obvious that according to several documentation, Kafka doesn't need large disk space (the largest recorded tmp is 36 GB).

推荐答案

"@ 21 MB"是什么意思?您的log.segment.bytes设置为1GB ...

What do you mean "@ 21 MB"? Your log.segment.bytes is set at 1GB...

首先,永远不要使用/tmp进行持久存储.并且不要将/home用于服务器数据.始终对服务器数据以及/var + /var/logs使用单独的分区/磁盘.

First, don't use /tmp for persistent storage, ever. And don't use /home for server data. Always use a separate partition/disk for server data as well as /var + /var/logs.

第二,您有2个Connect群集.使用相同的3个主题和相同的group.id,则您拥有1个分布式集群,并且避免了3个其他主题.

Second, you have 2 Connect Clusters. Use the same 3 topics and the same group.id, then you have 1 Distribtued Cluster and you save yourself from having 3 extra topics.

最后,

__ consumer_offset的数量比其他偏移量消耗更多的磁盘

the number of __consumer_offset is consume more disk than the other offset

是的,是的.所有消费者组都将偏移量存储在此处.到目前为止,这将是最大的内部主题,具体取决于您的 offsets.retention.minutes

Well, yes. All consumer groups store their offsets there. This will be the largest internal topic, by far, depending on your offsets.retention.minutes

Kafka不需要大磁盘空间

Kafka doesn't need large disk space

开始时不会 .

It doesn't when you are getting started.

我见过集群拥有数百TB的存储空间

I've seen clusters with tens-hundreds of TB of storage

如果您观看大型公司的Kafka峰会讲话,则它们每秒发送 GB的事件(例如Netflix,Spotify,Uber等)

If you watch Kafka Summit talks from large companies, they are sending GB of events per second (ref. Netflix, Spotify, Uber, etc)

  1. Apache
  2. 融合
  1. Apache
  2. Confluent

这篇关于为什么我的kafka tmp文件夹的大小几乎与磁盘大小相同?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆