Spark:从单个DStream中获取多个DStream [英] Spark : get Multiple DStream out of a single DStream

查看:101
本文介绍了Spark:从单个DStream中获取多个DStream的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

有可能在spark中从单个DStream中获取多个DStream.我的用例如下:我正在从HDFS文件获取日志数据流.日志行包含一个id(id = xyz).我需要根据ID对日志行进行不同的处理.因此,我尝试为输入Dstream中的每个ID设置不同的Dstream.我在文档中找不到任何相关内容.有谁知道如何在Spark中实现此目标,或指向此目标的任何链接.

Is is possible to get multiple DStream out of a single DStream in spark. My use case is follows: I am getting Stream of log data from HDFS file. The log line contains an id (id=xyz). I need to process log line differently based on the id. So I was trying to different Dstream for each id from input Dstream. I couldnt find anything related in documentation. Does anyone know how this can be achieved in Spark or point to any link for this.

谢谢

推荐答案

您不能从单个DStream中拆分多个DStream.您可以做的最好的事情是:-

You cannot Split multiple DStreams from Single DStreams. The best you can do is: -

  1. 修改您的源系统以具有用于不同ID的不同流,然后您可以具有不同的作业来处理不同的流
  2. 如果您的源无法更改并向您提供ID混合的流,那么您需要编写自定义逻辑来识别ID,然后执行适当的操作.

我总是更喜欢#1,因为这是更清洁的解决方案,但是有些例外需要实现#2.

I would always prefer #1 as that is cleaner solution but there are exceptions for which #2 needs to be implemented.

这篇关于Spark:从单个DStream中获取多个DStream的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆