通过星火对象的数目流斯卡拉窗口长度 [英] Spark streaming scala window length by number of objects

查看:158
本文介绍了通过星火对象的数目流斯卡拉窗口长度的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我使用的火花和Scala,我想创建具有长度对象的数目设置IE浏览器窗口启动空的窗口操作,作为流启动对象存储在窗口,直到它拥有10个对象,当11日来的第一个被丢弃。

I am using spark and scala and I would like to create a window operation with length set in number of objects i.e. the window starts empty, as the stream initiates the objects are stored in the window up until it holds 10 objects and when the 11th comes the first is dropped.

这是可能的,或者我必须使用其它结构像一个列表或数组?文档(的http://spark.apache.org/docs/latest/streaming-programming-guide.html#window-operations)有的谷歌搜索仅指基于时间窗(长度和间隔)。

Is this possible or do I have to use an other structure like a list or array? The documentation (http://spark.apache.org/docs/latest/streaming-programming-guide.html#window-operations) and some googling only refer to a time based window (length and interval).

感谢您提前。

推荐答案

窗口在星火流的特点是 windowDuration slideDuration (可选)。因此,它是一个时间窗口。但是你可以考虑使用阿帕奇弗林克。它同时支持计数窗口和时间窗口的。但相较于星火,弗林克还有一个流媒体的意识形态。当他们到达(星火处理事件在微批)它处理传入的事件。其结果是,弗林克可能有一些限制。试试看,如果它适合你的需要。

Window in Spark streaming is characterized by windowDuration and slideDuration (optional). So, it is a time window. But you can consider using Apache Flink. It supports both count windows and time windows. But in comparison to Spark, Flink has another streaming ideology. It process incoming events as they arrive (Spark processes events in micro-batches). As a result, Flink may have some restrictions. Give it a try if it suits your needs.

这篇关于通过星火对象的数目流斯卡拉窗口长度的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆