Kafka:消费者 API 与 Streams API [英] Kafka: Consumer API vs Streams API

查看:32
本文介绍了Kafka:消费者 API 与 Streams API的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我最近开始学习 Kafka 并最终解决了这些问题.

I recently started learning Kafka and end up with these questions.

  1. 消费者和流之间有什么区别?对我来说,如果任何工具/应用程序消费来自 Kafka 的消息,那么它就是 Kafka 世界中的消费者.

  1. What is the difference between Consumer and Stream? For me, if any tool/application consume messages from Kafka is a consumer in the Kafka world.

Stream 有什么不同,因为它也从 Kafka 消费或向 Kafka 生成消息?为什么需要它,因为我们可以编写自己的消费者应用程序使用消费者 API 并根据需要处理它们,还是从消费者应用程序将它们发送到 Spark?

How Stream is different as this also consumes from or produce messages to Kafka? and why is it needed as we can write our own consumer application using Consumer API and process them as needed or send them to Spark from the consumer application?

我对此进行了谷歌搜索,但没有得到任何好的答案.对不起,如果这个问题太琐碎了.

I did Google on this, but did not get any good answers for this. Sorry if this question is too trivial.

推荐答案

2021 年 1 月更新:我写了一个 关于 Kafka 基础知识的四部分博客系列建议阅读此类问题.对于这个问题,请查看 第 3 部分处理基础.

Update January 2021: I wrote a four-part blog series on Kafka fundamentals that I'd recommend to read for questions like these. For this question in particular, take a look at part 3 on processing fundamentals.

2018 年 4 月更新:现在您还可以使用 ksqlDB,事件流Kafka 数据库,用于处理 Kafka 中的数据.ksqlDB 建立在 Kafka 的 Streams API 之上,它也提供一流的 Streams 和 Tables 支持.

Update April 2018: Nowadays you can also use ksqlDB, the event streaming database for Kafka, to process your data in Kafka. ksqlDB is built on top of Kafka's Streams API, and it too comes with first-class support for Streams and Tables.

Consumer API 和 Streams API 有什么区别?

what is the difference between Consumer API and Streams API?

Kafka 的 Streams 库(https://kafka.apache.org/documentation/streams/) 建立在 Kafka 生产者和消费者客户端之上.Kafka Streams 明显比普通客户端更强大,也更具表现力.

Kafka's Streams library (https://kafka.apache.org/documentation/streams/) is built on top of the Kafka producer and consumer clients. Kafka Streams is significantly more powerful and also more expressive than the plain clients.

使用 Kafka Streams 从头到尾编写实际应用程序比使用普通消费者更简单、更快捷.

It's much simpler and quicker to write a real-world application start to finish with Kafka Streams than with the plain consumer.

以下是 Kafka Streams API 的一些功能,其中大部分功能不受消费者客户端支持(这需要您自己实现缺失的功能,基本上是重新实现 Kafka Streams).

Here are some of the features of the Kafka Streams API, most of which are not supported by the consumer client (it would require you to implement the missing features yourself, essentially re-implementing Kafka Streams).

  • 通过 Kafka 事务支持恰好一次处理语义(EOS 是什么意思)
  • 支持容错有状态(当然也包括无状态)处理,包括流joins, 聚合窗口化.换言之,它支持开箱即用地管理应用的处理状态.
  • 支持事件时间处理以及基于处理的在 processing-time摄取时间.它还可以无缝处理乱序数据.
  • 流和表,这是流处理遇到数据库的地方;在实践中,大多数流处理应用程序都需要流和表来实现它们各自的用例,所以如果流处理技术缺少这两个抽象中的任何一个(比如,不支持表),你要么被卡住,要么必须自己手动实现这个功能(祝你好运...)
  • 支持交互式查询(也称为可查询state') 通过请求-响应 API 向其他应用程序和服务公开最新的处理结果.这对于只能执行请求-响应,而不能执行流式处理的传统应用尤其有用.
  • 更具表现力:它附带 (1) 函数式编程风格 DSL 具有诸如 mapfilterreduce 以及 (2) 命令式处理器 API 例如进行复杂事件处理 (CEP),并且 (3) 您甚至可以将 DSL 和处理器 API 结合起来.
  • 有自己的测试套件,用于单元和集成测试.
  • Supports exactly-once processing semantics via Kafka transactions (what EOS means)
  • Supports fault-tolerant stateful (as well as stateless, of course) processing including streaming joins, aggregations, and windowing. In other words, it supports management of your application's processing state out-of-the-box.
  • Supports event-time processing as well as processing based on processing-time and ingestion-time. It also seamlessly processes out-of-order data.
  • Has first-class support for both streams and tables, which is where stream processing meets databases; in practice, most stream processing applications need both streams AND tables for implementing their respective use cases, so if a stream processing technology lacks either of the two abstractions (say, no support for tables) you are either stuck or must manually implement this functionality yourself (good luck with that...)
  • Supports interactive queries (also called 'queryable state') to expose the latest processing results to other applications and services via a request-response API. This is especially useful for traditional apps that can only do request-response, but not the streaming side of things.
  • Is more expressive: it ships with (1) a functional programming style DSL with operations such as map, filter, reduce as well as (2) an imperative style Processor API for e.g. doing complex event processing (CEP), and (3) you can even combine the DSL and the Processor API.
  • Has its own testing kit for unit and integration testing.

http://docs.confluent.io/current/streams/introduction.html 有关 Kafka Streams API 的更详细但仍然高级的介绍,这也应该有助于您了解与较低级别的 Kafka 消费者客户端的区别.

See http://docs.confluent.io/current/streams/introduction.html for a more detailed but still high-level introduction to the Kafka Streams API, which should also help you to understand the differences to the lower-level Kafka consumer client.

除了 Kafka Streams,您还可以使用流式数据库 ksqlDB 来处理 Kafka 中的数据.ksqlDB 将其存储层 (Kafka) 与其计算层(ksqlDB 本身;这里的大部分功能使用 Kafka Streams)分开.它支持与 Kafka Streams 基本相同的功能,但您编写流式 SQL 语句而不是 Java 或 Scala 代码.您可以通过 UI、CLI 和 REST API 与 ksqlDB 交互;它还有一个本地 Java 客户端,以防您不想使用 REST.最后,如果您不想自行管理您的基础设施,可以使用 ksqlDB作为 Confluent Cloud 中的一项完全托管的服务.

Beyond Kafka Streams, you can also use the streaming database ksqlDB to process your data in Kafka. ksqlDB separates its storage layer (Kafka) from its compute layer (ksqlDB itself; it uses Kafka Streams for most of its functionality here). It supports essentially the same features as Kafka Streams, but you write streaming SQL statements instead of Java or Scala code. You can interact with ksqlDB via a UI, CLI, and a REST API; it also has a native Java client in case you don't want to use REST. Lastly, if you prefer not having to self-manage your infrastructure, ksqlDB is available as a fully managed service in Confluent Cloud.

那么 Kafka Streams API 有何不同,因为它也从 Kafka 消费或向 Kafka 生成消息?

So how is the Kafka Streams API different as this also consumes from or produce messages to Kafka?

是的,Kafka Streams API 既可以读取数据,也可以将数据写入 Kafka.它支持 Kafka 事务,因此您可以例如从一个或多个主题读取一条或多条消息,如果需要,可选择更新处理状态,然后将一条或多条输出消息写入一个或多个主题——所有这些都作为一个原子操作.

Yes, the Kafka Streams API can both read data as well as write data to Kafka. It it supports Kafka transactions, so you can e.g. read one or more messages from one or more topic(s), optionally update processing state if you need to, and then write one or more output messages to one or more topics—all as one atomic operation.

为什么需要它,因为我们可以使用消费者 API 编写自己的消费者应用程序并根据需要处理它们或将它们从消费者应用程序发送到 Spark?

and why is it needed as we can write our own consumer application using Consumer API and process them as needed or send them to Spark from the consumer application?

是的,您可以编写自己的消费者应用程序——正如我提到的,Kafka Streams API 使用 Kafka 消费者客户端(加上生产者客户端)本身——但是您必须手动实现所有独特的功能Streams API 提供.有关免费"获得的所有内容,请参阅上面的列表.因此,用户会选择普通的消费者客户端而不是更强大的 Kafka Streams 库是一种罕见的情况.

Yes, you could write your own consumer application -- as I mentioned, the Kafka Streams API uses the Kafka consumer client (plus the producer client) itself -- but you'd have to manually implement all the unique features that the Streams API provides. See the list above for everything you get "for free". It is thus a rare circumstance that a user would pick the plain consumer client rather than the more powerful Kafka Streams library.

这篇关于Kafka:消费者 API 与 Streams API的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆