Kafka Streams 和 RPC:在 map() 运算符中调用 REST 服务是否被视为反模式? [英] Kafka Streams and RPC: is calling REST service in map() operator considered an anti-pattern?

查看:22
本文介绍了Kafka Streams 和 RPC:在 map() 运算符中调用 REST 服务是否被视为反模式?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

实现使用参考数据丰富存储在 Kafka 中的传入事件流的用例的天真方法是通过调用 map() 操作符提供此参考数据的外部服务 REST API, 对于每个传入事件.

The naive approach for implementing the use case of enriching an incoming stream of events stored in Kafka with reference data - is by calling in map() operator an external service REST API that provides this reference data, for each incoming event.

eventStream.map((key, event) -> /* query the external service here, then return the enriched event */)

另一种方法是使用带有参考数据的第二个事件流并将其存储在 KTable 中,这将是一个轻量级的嵌入式数据库",然后加入主事件流.

Another approach is to have second events stream with reference data and store it in KTable that will be a lightweight embedded "database" then join main event stream with it.

KStream<String, Object> eventStream = builder.stream(..., "event-topic");
KTable<String, Object> referenceDataTable = builder.table(..., "reference-data-topic");
KTable<String, Object> enrichedEventStream = eventStream 
    .leftJoin(referenceDataTable , (event, referenceData) -> /* return the enriched event */)
    .map((key, enrichedEvent) -> new KeyValue<>(/* new key */, enrichedEvent)
    .to("enriched-event-topic", ...);

天真的"方法可以被视为反模式吗?可以推荐KTable"方法作为首选方法吗?

Can the "naive" approach be considered an anti-pattern? Can the "KTable" approach be recommended as the preferred one?

Kafka 每分钟可以轻松管理数百万条消息.从 map() 操作符调用的服务也应该能够处理高负载和高可用性.这些是服务实现的额外要求.但是如果服务满足这些条件,可以使用天真"的方法吗?

Kafka can easily manage millions of messages per minute. Service that is called from the map() operator should be capable of handling high load too and also highly-available. These are extra requirements for the service implementation. But if the service satisfies these criteria can the "naive" approach be used?

推荐答案

是的,在map()操作等Kafka Streams操作里面做RPC是可以的.您只需要了解这样做的利弊,请参见下文.此外,您应该在您的操作中同步执行任何此类 RPC 调用(我不会在这里详细介绍原因;如果需要,我建议创建一个新问题).

Yes, it is ok to do RPC inside Kafka Streams operations such as map() operation. You just need to be aware of the pros and cons of doing so, see below. Also, you should do any such RPC calls synchronously from within your operations (I won't go into details here why; if needed, I'd suggest to create a new question).

在 Kafka Streams 操作中执行 RPC 调用的优点:

  • 您的应用程序将更容易融入现有架构,例如一种普遍使用 REST API 和请求/响应范式的地方.这意味着您可以在第一个概念验证或 MVP 方面快速取得更多进展.
  • 根据我的经验,这种方法对于许多开发人员(尤其是那些刚开始使用 Kafka 的开发人员)来说更容易理解,因为他们熟悉在过去的项目中以这种方式进行 RPC 调用.思考:从请求-响应架构逐渐过渡到事件驱动架构(由 Kafka 提供支持)是有帮助的.
  • 没有什么能阻止您从 RPC 调用和请求响应开始,然后迁移到更符合 Kafka 习惯的方法.

缺点:

  1. 您将 Kafka Streams 驱动的应用程序的可用性、可扩展性和延迟/吞吐量与您调用的 RPC 服务的可用性、可扩展性和延迟/吞吐量相结合.这也与考虑 SLA 相关.
  2. 与上一点相关,Kafka 和 Kafka Streams 的扩展性非常好.如果您正在大规模运行,您的 Kafka Streams 应用程序可能最终会对您的 RPC 服务进行 DDoS 攻击,因为后者可能无法像 Kafka 那样扩展.在实践中,您应该能够很容易地判断这对您来说是否存在问题.
  3. RPC 调用(例如从 map() 内部调用)是一种副作用,因此是 Kafka Streams 的黑匣子.Kafka Streams 的处理保证不会扩展到此类副作用.
    • 示例:Kafka Streams(默认情况下)根据事件时间(= 基于事件在现实世界中发生的时间)处理数据,因此您可以轻松地重新处理旧数据,并且仍然返回与当时相同的结果旧数据仍然是新的.但是您在此类重新处理期间调用的 RPC 服务可能会返回与当时"不同的响应.确保后者是您的责任.
    • 示例:在失败的情况下,Kafka Streams 将重试操作,即使在这种情况下,它也将保证只处理一次(如果启用).但它本身并不能保证您在 map() 内进行的 RPC 调用将是幂等的.确保后者是您的责任.
  1. You are coupling the availability, scalability, and latency/throughput of your Kafka Streams powered application to the availability, scalability, and latency/throughput of the RPC service(s) you are calling. This is relevant also for thinking about SLAs.
  2. Related to the previous point, Kafka and Kafka Streams scale very well. If you are running at large scale, your Kafka Streams application might end up DDoS'ing your RPC service(s) because the latter probably can't scale as much as Kafka. You should be able to judge pretty easily whether or not this is a problem for you in practice.
  3. An RPC call (like from within map()) is a side-effect and thus a black box for Kafka Streams. The processing guarantees of Kafka Streams do not extend to such side effects.
    • Example: Kafka Streams (by default) processes data based on event-time (= based on when an event happened in the real world), so you can easily re-process old data and still get back the same results as when the old data was still new. But the RPC service you are calling during such reprocessing might return a different response than "back then". Ensuring the latter is your responsibility.
    • Example: In the case of failures, Kafka Streams will retry operations, and it will guarantee exactly-once processing (if enabled) even in such situations. But it can't guarantee, by itself, that an RPC call you are doing from within map() will be idempotent. Ensuring the latter is your responsibility.

替代方案

如果您想知道还有哪些其他选择:例如,如果您正在执行 RPC 调用来查找数据(例如,用于使用侧/上下文信息丰富传入的事件流),您可以解决上述缺点通过直接在 Kafka 中提供查找数据.如果查找数据在 MySQL 中,您可以设置一个 Kafka 连接器来持续将 MySQL 数据摄取到 Kafka 主题中(想想:CDC).在 Kafka Streams 中,您可以将查找数据读入 KTable 并通过流表连接执行输入流的扩充.

In case you are wondering what other alternatives you have: If, for example, you are doing RPC calls for looking up data (e.g. for enriching an incoming stream of events with side/context information), you can address the downsides above by making the lookup data available in Kafka directly. If the lookup data is in MySQL, you can setup a Kafka connector to continuously ingest the MySQL data into a Kafka topic (think: CDC). In Kafka Streams, you can then read the lookup data into a KTable and perform the enrichment of your input stream via a stream-table join.

这篇关于Kafka Streams 和 RPC:在 map() 运算符中调用 REST 服务是否被视为反模式?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆