卡夫卡作为未来事件的数据存储 [英] Kafka as a data store for future events

查看:208
本文介绍了卡夫卡作为未来事件的数据存储的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个Kafka集群,它基于源中的数据更改从源接收消息。在某些情况下,邮件将来会被处理。所以我有两个选择:

I have a Kafka cluster which receives messages from a source based on data changes in that source. In some cases the messages are meant to be processed in the future. So I have 2 options:


  1. 消费所有的消息,并将这些未来的消息传回到Kafka的不同主题下日期在主题名称),并有一个Storm拓扑结构,可以查找该日期名称的主题。这将确保消息只在其意图当天被处理。

  2. 将它存储在一个单独的数据库中,并构建一个调度程序,只在未来的日期才能读取消息和帖子到Kafka。 / li>
  1. Consume all messages and post messages that are meant for the future back to Kafka under a different topic (with the date in the topic name) and have a Storm topology that looks for topics with that date's name in it. This will ensure that messages are processed only on the day it's meant for.
  2. Store it in a separate DB and build a scheduler that reads messages and posts to Kafka only on that future date.

选项1更容易执行,但我的问题是:Kafka是耐用数据存储吗?有没有人用卡夫卡做这种事情?设计中有空洞吗?

Option 1 is easier to execute but my question is: Is Kafka a durable data store? And has anyone done this sort of eventing with Kafka? Are there any gaping holes in the design?

推荐答案

您可以配置消息在Kafka中停留的时间(log.retention

You can configure the amount of time your messages stay in Kafka (log.retention.hours).

但请记住,Kafka旨在用作您的制作者和您的消费者之间的实时缓冲区,而不是耐用的数据存储。我不认为Kafka + Storm将是您的用例的适当工具。
为什么不在一些分布式文件系统中写邮件,并安排一个工作(MapReduce,Spark ...)来处理这些事件?

But keep in mind that Kafka is meant to be used as a "real-time buffer" between your producers and your consumers, not as durable data store. I don't think Kafka+Storm would be the appropriate tool for your use case. Why not just write your messages in some distributed file system, and schedule a job (MapReduce, Spark...) to process those events?

这篇关于卡夫卡作为未来事件的数据存储的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆