在处理来自 Kafka 的消息时避免数据丢失 [英] Avoid Data Loss While Processing Messages from Kafka

查看:27
本文介绍了在处理来自 Kafka 的消息时避免数据丢失的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

寻找设计我的 Kafka 消费者的最佳方法.基本上我想看看什么是避免数据丢失的最佳方法,以防万一处理消息期间的异常/错误.

Looking out for best approach for designing my Kafka Consumer. Basically I would like to see what is the best way to avoid data loss in case there are any exception/errors during processing the messages.

我的用例如下.

a) 我使用 SERVICE 来处理消息的原因是 - 将来我计划编写一个 ERROR PROCESSOR 应用程序,它将在一天结束时运行,它将尝试处理失败的消息(不是所有消息,但由于缺少父级等任何依赖项而失败的消息再次出现.

a) The reason why I am using a SERVICE to process the message is - in future I am planning to write an ERROR PROCESSOR application which would run at the end of the day, which will try to process the failed messages (not all messages, but messages which fails because of any dependencies like parent missing) again.

b) 我想确保消息丢失为零,因此我会将消息保存到文件中,以防在将消息保存到数据库时出现任何问题.

b) I want to make sure there is zero message loss and so I will save the message to a file in case there are any issues while saving the message to DB.

c) 在生产环境中,可能有多个消费者和服务实例在运行,因此很有可能多个应用程序尝试写入同一个文件.

c) In production environment there can be multiple instances of consumer and services running and so there is high chance that multiple applications try to write to the same file.

Q-1) 写入文件是避免数据丢失的唯一选择吗?

Q-1) Is writing to file the only option to avoid data loss ?

Q-2) 如果它是唯一的选择,如何确保多个应用程序同时写入同一个文件并读取?请在将来考虑一旦错误处理器正在构建,它可能正在从同一个文件中读取消息,而另一个应用程序正在尝试写入该文件.

Q-2) If it is the only option, how to make sure multiple applications write to the same file and read at the same time ? Please consider in future once the error processor is build, it might be reading the messages from the same file while another application is trying to write to the file.

错误处理器 - 我们的来源遵循事件驱动机制,并且很有可能有时依赖事件(例如,某事的父实体)可能会延迟几天.所以在这种情况下,我希望我的 ERROR PROCESSOR 多次处理相同的消息.

ERROR PROCESSOR - Our source is following a event driven mechanics and there is high chance that some times the dependent event (for example, the parent entity for something) might get delayed by a couple of days. So in that case, I want my ERROR PROCESSOR to process the same messages multiple times.

推荐答案

我以前遇到过类似的情况.所以,直接进入你的问题:

I've run into something similar before. So, diving straight into your questions:

  • 不一定,您可以将这些消息以新主题发送回 Kafka(比方说 - error-topic).因此,当您的错误处理器准备就绪时,它可以只侦听此 error-topic 并在它们进来时使用这些消息.

  • Not necessarily, you could perhaps send those messages back to Kafka in a new topic (let's say - error-topic). So, when your error processor is ready, it could just listen in to the this error-topic and consume those messages as they come in.

我认为这个问题已经在第一个问题的回答中得到了解决.因此,与其使用一个文件来写入和读取以及打开多个文件句柄来同时执行此操作,Kafka 可能是更好的选择,因为它专为此类问题而设计.

I think this question has been addressed in response to the first one. So, instead of using a file to write to and read from and open multiple file handles to do this concurrently, Kafka might be a better choice as it is designed for such problems.

注意:以下几点只是基于我对您的问题域的有限理解,仅供参考.因此,您可以选择安全地忽略这一点.

Note: The following point is just some food for thought based on my limited understanding of your problem domain. So, you may just choose to ignore this safely.

service 组件的设计中还有一点值得考虑 - 您不妨考虑通过将所有错误消息发送回 Kafka 来合并第 4 点和第 5 点.这将使您能够以一致的方式处理所有错误消息,而不是将一些消息放在错误数据库中,而将一些消息放在 Kafka 中.

One more point worth considering on your design for the service component - You might as well consider merging points 4 and 5 by sending all the error messages back to Kafka. That will enable you to process all error messages in a consistent way as opposed to putting some messages in the error DB and some in Kafka.

编辑:根据有关错误处理器要求的附加信息,以下是解决方案设计的图示.

EDIT: Based on the additional information on the ERROR PROCESSOR requirement, here's a diagrammatic representation of the solution design.

我现在特意保留了 ERROR PROCESSOR 摘要的输出,只是为了保持通用性.

I've deliberately kept the output of the ERROR PROCESSOR abstract for now just to keep it generic.

我希望这会有所帮助!

这篇关于在处理来自 Kafka 的消息时避免数据丢失的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆