如何与不同批次的持续时间设置多个星火流工作？ [英] How do you setup multiple Spark Streaming jobs with different batch durations?

查看：227 发布时间：2016/5/22 16:02:46 hadoop apache-spark spark-streaming

本文介绍了如何与不同批次的持续时间设置多个星火流工作？的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我们正在将一个大型企业的当前数据架构的开始阶段，我目前正在建设一个Spark流ETL架构中，我们将我们所有的源连接到目的地（源/目标可能是卡夫卡的话题，弗卢姆，HDFS等）通过变换。这看起来是这样的：

We are in the beginning phases of transforming the current data architecture of a large enterprise and I am currently building a Spark Streaming ETL framework in which we would connect all of our sources to destinations (source/destinations could be Kafka topics, Flume, HDFS, etc.) through transformations. This would look something like:

SparkStreamingEtlManager.addEtl（资料来源，转换*，目的地） SparkStreamingEtlManager.streamEtl（） streamingContext.start（）

的假设是，因为我们应该只有一个SparkContext，我们将部署所有的ETL管道在一个应用程序/瓶。

The assumptions is that, since we should only have one SparkContext, we would deploy all of the ETL pipelines in one application/jar.

这里的问题是，batchDuration是上下文本身不是ReceiverInputDStream的属性和（这是为什么？）。我们是否需要因此拥有多个星火集群，或允许多个SparkContexts和部署多个应用程序？是否有任何其他的方式来控制每个接收器批次持续时间？

The problem with this is that the batchDuration is an attribute of the context itself and not of the ReceiverInputDStream (Why is this?). Do we need to therefore have multiple Spark Clusters, or, allow for multiple SparkContexts and deploy multiple applications? Is there any other way to control the batch duration per receiver?

请让我知道如果我的任何假设都是幼稚的，或需要被改写。谢谢！

Please let me know if any of my assumptions are naive or need to be rephrased. Thanks!

如何与不同批次的持续时间设置多个星火流工作？ [英] How do you setup multiple Spark Streaming jobs with different batch durations?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

如何与不同批次的持续时间设置多个星火流工作？ [英] How do you setup multiple Spark Streaming jobs with different batch durations?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭