RDD和Spark中的批处理之间的区别? [英] Difference between RDDs and Batches in Spark?

查看:96
本文介绍了RDD和Spark中的批处理之间的区别?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

RDD 是在群集的各个节点之间分区的元素的集合.它是核心组件和抽象.

RDD is a collection of elements partitioned across the nodes of the cluster. It's core component and abstraction.

批处理:SparkStreaming API只是将数据分为多个批处理,这些批处理也与相同的Streaming对象/元素集合相同.根据需要,可以设置一组批次,这些批次以基于时间的批次窗口和密集的在线基于活动的批次窗口的形式定义.

Batches: SparkStreaming API simply divides the data into batches, that batches also same collection of Streaming objects/elements. Based on requirement a set of batches defined in the form time based batch window and intensive online activity based batch window.

Rdd&之间有什么区别?完全是批次吗?

What is the difference between Rdd & Batches exactly?

推荐答案

RDD 和批处理本质上是不同的,但在Spark中是相关的.正如问题中提到的那样, RDD 是Spark的基本概念,因为它们形成了Spark中分布式计算的基础数据结构.

RDDs and batches are essentially different but related things in Spark. As mentioned in the question, RDDs are a fundamental Spark concept, as they are form the base data structure for distributed computations in Spark.

RDD [T] s是分布在集群分区中的 [T] 类型元素的虚拟集合.

An RDD[T]s is a virtual collection of elements of type [T] distributed over partitions in a cluster.

在Spark Streaming中,批处理"是在 batchInterval 时间内收集数据的结果.数据以块"的形式收集,并且块的大小由 spark.streaming.blockInterval 配置参数确定.

In Spark Streaming, a "batch" is the result of collecting data during batchInterval time. The data is collected in 'blocks', and the size of the blocks is determined by the spark.streaming.blockInterval config parameter.

这些块将提交给Spark Core引擎进行处理.每个批次的块集变成一个 RDD ,每个块是一个RDD分区.

Those blocks are submitted to the Spark Core engine for processing. The set of blocks for each batch becomes one RDD and each block is one RDD partition.

说批次和 RDD 是同一件事是不正确的.当提交给Spark Core处理时,Spark Streaming批处理数据将成为RDD.

It would be incorrect to say that batches and RDDs are the same thing. A Spark Streaming batch of data becomes an RDD when it's submitted for processing to the Spark Core.

这篇关于RDD和Spark中的批处理之间的区别?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆