在处理来自Kafka的消息时避免数据丢失 [英] Avoid Data Loss While Processing Messages from Kafka

查看:123
本文介绍了在处理来自Kafka的消息时避免数据丢失的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

寻找设计我的Kafka Consumer的最佳方法.基本上,我想看看在有任何情况下避免数据丢失的最佳方法是什么 处理消息期间出现异常/错误.

Looking out for best approach for designing my Kafka Consumer. Basically I would like to see what is the best way to avoid data loss in case there are any exception/errors during processing the messages.

我的用例如下.

a)我使用SERVICE处理消息的原因是-将来,我计划编写一个将在一天结束时运行的ERROR PROCESSOR应用程序,它将尝试处理失败的消息(不是所有消息,但由于诸如父项之类的任何依赖项丢失而导致消息失败).

a) The reason why I am using a SERVICE to process the message is - in future I am planning to write an ERROR PROCESSOR application which would run at the end of the day, which will try to process the failed messages (not all messages, but messages which fails because of any dependencies like parent missing) again.

b)我想确保消息丢失为零,因此在将消息保存到DB时出现任何问题时,我会将消息保存到文件中.

b) I want to make sure there is zero message loss and so I will save the message to a file in case there are any issues while saving the message to DB.

c)在生产环境中,可能有多个使用者和服务实例正在运行,因此多个应用程序极有可能尝试写入 相同的文件.

c) In production environment there can be multiple instances of consumer and services running and so there is high chance that multiple applications try to write to the same file.

Q-1)写入文件是否是避免数据丢失的唯一选择?

Q-1) Is writing to file the only option to avoid data loss ?

Q-2)如果这是唯一的选择,那么如何确保多个应用程序写入同一文件并同时读取?以后请考虑一下错误处理程序 是构建版本,则可能是在另一个应用程序尝试写入该文件的同时从同一文件读取消息.

Q-2) If it is the only option, how to make sure multiple applications write to the same file and read at the same time ? Please consider in future once the error processor is build, it might be reading the messages from the same file while another application is trying to write to the file.

错误处理器-我们的消息来源是遵循事件驱动机制的,并且很有可能有时依赖事件(例如某事物的父实体)可能会延迟几天.因此,在那种情况下,我希望我的错误处理器多次处理相同的消息.

ERROR PROCESSOR - Our source is following a event driven mechanics and there is high chance that some times the dependent event (for example, the parent entity for something) might get delayed by a couple of days. So in that case, I want my ERROR PROCESSOR to process the same messages multiple times.

推荐答案

我之前遇到过类似的情况.因此,直接探讨您的问题:

I've run into something similar before. So, diving straight into your questions:

  • 不一定,您可以在一个新主题中(例如-error-topic)将这些消息发送回Kafka.因此,当您的错误处理器就绪时,它可以侦听此error-topic并在消息进入时使用这些消息.

  • Not necessarily, you could perhaps send those messages back to Kafka in a new topic (let's say - error-topic). So, when your error processor is ready, it could just listen in to the this error-topic and consume those messages as they come in.

我认为这个问题已经针对第一个问题进行了解答.因此,与使用文件进行读写操作并同时打开多个文件句柄来同时执行此操作相比,Kafka可能是一个更好的选择,因为它是针对此类问题而设计的.

I think this question has been addressed in response to the first one. So, instead of using a file to write to and read from and open multiple file handles to do this concurrently, Kafka might be a better choice as it is designed for such problems.

注意 :基于我对您的问题领域的有限了解,以下几点值得深思.因此,您可以选择安全地忽略它.

Note: The following point is just some food for thought based on my limited understanding of your problem domain. So, you may just choose to ignore this safely.

service组件的设计上还有一个值得考虑的问题-您最好考虑通过将所有错误消息发送回Kafka来合并第4点和第5点.这样一来,您便可以以一致的方式处理所有错误消息,而不是将某些消息放入错误数据库中,而将某些消息放入Kafka中.

One more point worth considering on your design for the service component - You might as well consider merging points 4 and 5 by sending all the error messages back to Kafka. That will enable you to process all error messages in a consistent way as opposed to putting some messages in the error DB and some in Kafka.

编辑:根据有关ERROR PROCESSOR要求的其他信息,这是解决方案设计的示意图.

EDIT: Based on the additional information on the ERROR PROCESSOR requirement, here's a diagrammatic representation of the solution design.

我现在故意保留ERROR PROCESSOR抽象的输出,只是为了使其通用.

I've deliberately kept the output of the ERROR PROCESSOR abstract for now just to keep it generic.

我希望这会有所帮助!

这篇关于在处理来自Kafka的消息时避免数据丢失的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆