Kafka Streams API:KStream到KTable [英] Kafka Streams API: KStream to KTable
问题描述
我有一个Kafka主题,我在其中发送位置事件(键=用户ID,值=用户位置).我可以将其作为KStream
进行读取和处理:
I have a Kafka topic where I send location events (key=user_id, value=user_location). I am able to read and process it as a KStream
:
KStreamBuilder builder = new KStreamBuilder();
KStream<String, Location> locations = builder
.stream("location_topic")
.map((k, v) -> {
// some processing here, omitted form clarity
Location location = new Location(lat, lon);
return new KeyValue<>(k, location);
});
这很好,但是我想在每个用户的最后一个位置使用KTable
.我该怎么办?
That works well, but I'd like to have a KTable
with the last known position of each user. How could I do it?
我可以写和读一个中间主题:
I am able to do it writing to and reading from an intermediate topic:
// write to intermediate topic
locations.to(Serdes.String(), new LocationSerde(), "location_topic_aux");
// build KTable from intermediate topic
KTable<String, Location> table = builder.table("location_topic_aux", "store");
是否有从KStream
获取KTable
的简单方法?这是我第一个使用Kafka Streams的应用程序,因此我可能缺少明显的东西.
Is there a simple way to obtain a KTable
from a KStream
? This is my first app using Kafka Streams, so I'm probably missing something obvious.
推荐答案
更新:
在Kafka 2.5中,将添加新的方法KStream#toTable()
,这将提供一种方便的方法将KStream
转换为KTable
.有关详细信息,请参见: https://cwiki.apache.org/confluence/display/KAFKA/KIP-523%3A+Add+KStream%23toTable+to+the+Streams+DSL
In Kafka 2.5, a new method KStream#toTable()
will be added, that will provide a convenient way to transform a KStream
into a KTable
. For details see: https://cwiki.apache.org/confluence/display/KAFKA/KIP-523%3A+Add+KStream%23toTable+to+the+Streams+DSL
原始答案:
目前尚无直截了当的方法.如Confluent常见问题解答中所述,您的方法绝对有效:
There is not straight forward way at the moment to do this. Your approach is absolutely valid as discussed in Confluent FAQs: http://docs.confluent.io/current/streams/faq.html#how-can-i-convert-a-kstream-to-a-ktable-without-an-aggregation-step
关于代码,这是最简单的方法.但是,这样做的缺点是(a)您需要管理其他主题,并且(b)由于向Kafka写入数据并从中重新读取数据,因此会导致额外的网络流量.
This is the simplest approach with regard to the code. However, it has the disadvantages that (a) you need to manage an additional topic and that (b) it results in additional network traffic because data is written to and re-read from Kafka.
有一种选择,使用虚拟减少":
There is one alternative, using a "dummy-reduce":
KStreamBuilder builder = new KStreamBuilder();
KStream<String, Long> stream = ...; // some computation that creates the derived KStream
KTable<String, Long> table = stream.groupByKey().reduce(
new Reducer<Long>() {
@Override
public Long apply(Long aggValue, Long newValue) {
return newValue;
}
},
"dummy-aggregation-store");
与方法1相比,这种方法在代码上稍微复杂一些,但具有以下优点:(a)不需要手动主题管理,并且(b)无需从Kafka重新读取数据.
This approach is somewhat more complex with regard to the code compared to option 1 but has the advantage that (a) no manual topic management is required and (b) re-reading the data from Kafka is not necessary.
总体而言,您需要自己决定哪种方法更合适:
Overall, you need to decide by yourself, which approach you like better:
在选项2中,Kafka Streams将创建一个内部变更日志主题来备份KTable以实现容错功能.因此,这两种方法都需要在Kafka中进行一些额外的存储,并导致额外的网络流量.总体而言,这是在选项2中稍微复杂一些的代码与选项1中的手动主题管理之间的权衡.
In option 2, Kafka Streams will create an internal changelog topic to back up the KTable for fault tolerance. Thus, both approaches require some additional storage in Kafka and result in additional network traffic. Overall, it’s a trade-off between slightly more complex code in option 2 versus manual topic management in option 1.
这篇关于Kafka Streams API:KStream到KTable的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!