Spark:从单个DStream中获取多个DStream [英] Spark : get Multiple DStream out of a single DStream
问题描述
有可能在spark中从单个DStream中获取多个DStream.我的用例如下:我正在从HDFS文件获取日志数据流.日志行包含一个id(id = xyz).我需要根据ID对日志行进行不同的处理.因此,我尝试为输入Dstream中的每个ID设置不同的Dstream.我在文档中找不到任何相关内容.有谁知道如何在Spark中实现此目标,或指向此目标的任何链接.
Is is possible to get multiple DStream out of a single DStream in spark. My use case is follows: I am getting Stream of log data from HDFS file. The log line contains an id (id=xyz). I need to process log line differently based on the id. So I was trying to different Dstream for each id from input Dstream. I couldnt find anything related in documentation. Does anyone know how this can be achieved in Spark or point to any link for this.
谢谢
推荐答案
您不能从单个DStream中拆分多个DStream.您可以做的最好的事情是:-
You cannot Split multiple DStreams from Single DStreams. The best you can do is: -
- 修改您的源系统以具有用于不同ID的不同流,然后您可以具有不同的作业来处理不同的流
- 如果您的源无法更改并向您提供ID混合的流,那么您需要编写自定义逻辑来识别ID,然后执行适当的操作.
我总是更喜欢#1,因为这是更清洁的解决方案,但是有些例外需要实现#2.
I would always prefer #1 as that is cleaner solution but there are exceptions for which #2 needs to be implemented.
这篇关于Spark:从单个DStream中获取多个DStream的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!