试图了解火花流窗口 [英] Trying to understand spark streaming windowing

查看:224
本文介绍了试图了解火花流窗口的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我调查星火流作为反欺诈服务我建立一个解决方案,但我在努力弄清楚究竟是如何将其应用到我的用例。用例是:从用户会话数据流,和一个风险评分被计算为一个给定的用户,数据10秒收集该用户之后。我计划使用2秒的间隔批次的时间,但需要从完整的10第二个窗口中使用的数据。起初,updateStateByKey()似乎是完美的解决方案,因为我可以建立使用该系统收集事件UserRisk对象。麻烦的是,我不知道如何告诉星火停止更新用户后,10秒钟过去了,截至10秒大关,我跑了反对UserRisk对象推理引擎,并持续的结果。另一种方法是在窗口的转变。与窗口改造的问题是,我必须手动dedup数据,这可能是一种浪费。如何告诉updateStateByKey停止减少的时间间隔后的某个键任何建议已经过去了?

I'm investigating Spark Streaming as a solution for an anti-fraud service I am building, but I am struggling to figure out exactly how to apply it to my use case. The use case is: data from a user session is streamed, and a risk score is calculated for a given user, after 10 seconds of data is collected for that user. I am planning on using a batch interval time of 2 seconds, but need to use data from the full 10 second window. At first, updateStateByKey() seemed to be the perfect solution, as I could build up a UserRisk object using the events the system collects. The trouble is, I am not sure how to tell Spark to stop updating a user after the 10 seconds have passed, as at the 10 second mark, I run our inference engine against the UserRisk object, and persist the result. The other approach is the window transformation. The issue with the window transformation is that I have to dedup data manually, which might be wasteful. Any suggestions on how to tell updateStateByKey to stop reducing on a certain key after an interval of time has passed?

推荐答案

根据你的情况,你可以尝试 reduceByKeyAndWindow DSTREAM功能,这将满足您的要求。

According to your case, you can try reduceByKeyAndWindow Dstream function, It will fulfill your requirement

这里是Java示例code

JavaPairDStream<String, Integer> counts = pairs.reduceByKeyAndWindow(
                new Function2<Integer, Integer, Integer>() {
                    public Integer call(Integer i1, Integer i2) {
                        return i1 + i2;
                    }
                }, new Function2<Integer, Integer, Integer>() {
                    public Integer call(Integer i1, Integer i2) {
                        return i1 - i2;
                    }
                }, new Duration(60 * 1000), new Duration(2 * 1000));

一些重要环节

http://spark.apache.org/docs/latest/streaming-programming-guide.html#window-operations

星火流窗口操作

这篇关于试图了解火花流窗口的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆