Kafka Streams和RPC:在map()运算符中调用REST服务是否被视为反模式? [英] Kafka Streams and RPC: is calling REST service in map() operator considered an anti-pattern?

查看:103
本文介绍了Kafka Streams和RPC:在map()运算符中调用REST服务是否被视为反模式?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

用于实现用参考数据丰富存储在Kafka中的事件传入流的用例的幼稚方法是通过在map()运算符中调用一个外部服务REST API,该API为每个传入事件提供此参考数据./p>

The naive approach for implementing the use case of enriching an incoming stream of events stored in Kafka with reference data - is by calling in map() operator an external service REST API that provides this reference data, for each incoming event.

eventStream.map((key, event) -> /* query the external service here, then return the enriched event */)

另一种方法是让第二事件流具有参考数据,并将其存储在将是轻量级嵌入式数据库"的KTable中,然后将其与主事件流一起加入.

Another approach is to have second events stream with reference data and store it in KTable that will be a lightweight embedded "database" then join main event stream with it.

KStream<String, Object> eventStream = builder.stream(..., "event-topic");
KTable<String, Object> referenceDataTable = builder.table(..., "reference-data-topic");
KTable<String, Object> enrichedEventStream = eventStream 
    .leftJoin(referenceDataTable , (event, referenceData) -> /* return the enriched event */)
    .map((key, enrichedEvent) -> new KeyValue<>(/* new key */, enrichedEvent)
    .to("enriched-event-topic", ...);

天真的"方法可以被视为反模式吗?可以推荐使用"KTable"方法作为首选方法吗?

Can the "naive" approach be considered an anti-pattern? Can the "KTable" approach be recommended as the preferred one?

Kafka可以轻松地每分钟管理数百万条消息.从map()运算符调用的服务也应该能够处理高负载,并且也应具有很高的可用性.这些是服务实施的额外要求.但是,如果服务满足这些条件,可以使用天真"方法吗?

Kafka can easily manage millions of messages per minute. Service that is called from the map() operator should be capable of handling high load too and also highly-available. These are extra requirements for the service implementation. But if the service satisfies these criteria can the "naive" approach be used?

推荐答案

是的,可以在Kafka Streams操作(例如map()操作)中进行RPC.您只需要了解这样做的利弊,请参见下文.另外,您应该在操作中同步进行任何此类RPC调用(我在这里不做详细说明为什么;如果需要,建议创建一个新问题).

Yes, it is ok to do RPC inside Kafka Streams operations such as map() operation. You just need to be aware of the pros and cons of doing so, see below. Also, you should do any such RPC calls synchronously from within your operations (I won't go into details here why; if needed, I'd suggest to create a new question).

在Kafka Streams操作中进行RPC调用的优点:

  • 您的应用程序将更容易适应现有架构,例如一种常见的使用REST API和请求/响应范例的地方.这意味着您可以快速获得第一个概念验证或MVP.
  • 根据我的经验,许多开发人员(尤其是刚开始使用Kafka的开发人员)更容易理解这种方法,因为他们熟悉从过去的项目中以这种方式进行RPC调用.想一想:它将有助于从请求响应架构逐步过渡到事件驱动的架构(由Kafka提供支持).
  • 没有什么可以阻止您从RPC调用和请求-响应开始,然后再迁移到更加Kafka惯用的方法.

缺点:

  1. 您正在将Kafka Streams驱动的应用程序的可用性,可伸缩性和延迟/吞吐量与所调用的RPC服务的可用性,可伸缩性和延迟/吞吐量进行耦合.这也与考虑SLA有关.
  2. 与上一点有关,Kafka和Kafka Streams的伸缩性非常好.如果您大规模运行,则您的Kafka Streams应用程序可能最终会在您的RPC服务上进行DDoS,因为后者可能无法像Kafka那样扩展.在实践中,您应该可以轻松判断出这是否是您的问题.
  3. RPC调用(如从map()内部进行)是一种副作用,因此是Kafka Streams的黑匣子. Kafka Streams的处理保证不会扩展到此类副作用.
    • 示例:Kafka Streams(默认情况下)基于事件时间(=基于现实世界中事件发生的时间)处理数据,因此您可以轻松地重新处理旧数据,并仍然获得与时间相同的结果旧的数据仍然是新的.但是在这种重新处理过程中您正在调用的RPC服务可能返回的响应不同于然后返回".确保后者是您的责任.
    • 示例:在失败的情况下,Kafka Streams将重试操作,即使在这种情况下,它也将保证一次处理(如果启用).但是它本身不能保证您从map()内部进行的RPC调用将是幂等的.确保后者是您的责任.
  1. You are coupling the availability, scalability, and latency/throughput of your Kafka Streams powered application to the availability, scalability, and latency/throughput of the RPC service(s) you are calling. This is relevant also for thinking about SLAs.
  2. Related to the previous point, Kafka and Kafka Streams scale very well. If you are running at large scale, your Kafka Streams application might end up DDoS'ing your RPC service(s) because the latter probably can't scale as much as Kafka. You should be able to judge pretty easily whether or not this is a problem for you in practice.
  3. An RPC call (like from within map()) is a side-effect and thus a black box for Kafka Streams. The processing guarantees of Kafka Streams do not extend to such side effects.
    • Example: Kafka Streams (by default) processes data based on event-time (= based on when an event happened in the real world), so you can easily re-process old data and still get back the same results as when the old data was still new. But the RPC service you are calling during such reprocessing might return a different response than "back then". Ensuring the latter is your responsibility.
    • Example: In the case of failures, Kafka Streams will retry operations, and it will guarantee exactly-once processing (if enabled) even in such situations. But it can't guarantee, by itself, that an RPC call you are doing from within map() will be idempotent. Ensuring the latter is your responsibility.

替代

如果您想知道还有哪些其他选择:例如,如果您正在进行RPC调用以查找数据(例如,使用边/上下文信息丰富事件的传入流),则可以解决上述缺点通过直接在Kafka中提供查找数据.如果查找数据在MySQL中,则可以设置Kafka连接器,以将MySQL数据连续提取到Kafka主题中(以CDC为例).然后,在Kafka Streams中,您可以将查找数据读入KTable并通过流表联接来执行输入流的充实.

In case you are wondering what other alternatives you have: If, for example, you are doing RPC calls for looking up data (e.g. for enriching an incoming stream of events with side/context information), you can address the downsides above by making the lookup data available in Kafka directly. If the lookup data is in MySQL, you can setup a Kafka connector to continuously ingest the MySQL data into a Kafka topic (think: CDC). In Kafka Streams, you can then read the lookup data into a KTable and perform the enrichment of your input stream via a stream-table join.

这篇关于Kafka Streams和RPC:在map()运算符中调用REST服务是否被视为反模式?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆