Apache Flink:如何计算 DataStream 中的事件总数 [英] Apache Flink: How to count the total number of events in a DataStream

查看:24
本文介绍了Apache Flink:如何计算 DataStream 中的事件总数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有两个原始流,我正在加入这些流,然后我想计算已加入的事件总数和未加入的事件总数.我是通过在 joinedEventDataStream 上使用 map 来做到这一点的,如下所示

I have two raw streams and I am joining those streams and then I want to count what is the total number of events that have been joined and how much events have not. I am doing this by using map on joinedEventDataStream as shown below

joinedEventDataStream.map(new RichMapFunction<JoinedEvent, Object>() {

            @Override
            public Object map(JoinedEvent joinedEvent) throws Exception {

                number_of_joined_events += 1;

                return null;
            }
        });

问题 1:这是计算流中事件数量的合适方法吗?

Question # 1: Is this the appropriate way to count the number of events in the stream?

问题 2:我注意到一种连线行为,你们中的一些人可能不相信.问题是,当我在 IntelliJ IDE 中运行我的 Flink 程序时,它向我显示了 number_of_joined_events 的正确值,但在我将此程序作为 jar 提交的情况下显示了 0.因此,当我将程序作为 jar 文件而不是实际计数运行时,我得到了 number_of_joined_events 的初始值.为什么只有在 jar 文件提交的情况下才会发生这种情况,而不是在 IDE 中?

Question # 2: I have noticed a wired behavior, which some of you might not believe. The issue is that when I run my Flink program in IntelliJ IDE, it shows me correct value for number_of_joined_events but 0 in the case when I submit this program as jar. So I am getting the initial value of number_of_joined_events when I run the program as a jar file instead of the actual count. Why is this happening only in case of jar file submission and not in IDE?

推荐答案

您的方法不起作用.您在通过 JAR 文件执行程序时注意到的行为是正常的.

Your approach is not working. The behavior you noticed when executing the program via a JAR file is expected.

我不知道 number_of_joined_events 是如何定义的,但我假设它是您程序中的静态变量.当您在 IDE 中运行该程序时,它会在单个 JVM 中运行.因此,所有运算符都可以访问静态变量.当您向远程进程提交 JAR 文件时,该程序将在不同的 JVM(可能是多个 JVM)中执行,并且客户端进程中的静态变量永远不会更新.

I don't know how number_of_joined_events is defined, but I assume its a static variable in your program. When you run the program in your IDE, it runs in a single JVM. Hence, all operators have access to the static variable. When you submit a JAR file to a remote process, the program is executed in a different JVM (possibly multiple JVMs) and the static variable in your client process is never updated.

您可以使用 Flink 的指标或 ReduceFunction1s 相加来计算处理记录的数量.

You can use Flink's metrics or a ReduceFunction that sums 1s to count the number of processed records.

这篇关于Apache Flink:如何计算 DataStream 中的事件总数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆