什么是Spark中RDD转换的结果? [英] What is the result of RDD transformation in Spark?

查看:209
本文介绍了什么是Spark中RDD转换的结果?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

谁能解释一下,是什么 RDD转换的结果?它是一组新的数据(数据的副本),或者它是唯一的新指针集合,对旧的数据的过滤块

Can anyone explain, what is the result of RDD transformations? Is it the new set of data (copy of data) or it is only new set of pointers, to filtered blocks of old data?

推荐答案

RDD转换允许您创建RDDS之间的依赖关系。相关性是产生结果(程序)仅几步之遥。每个RDD在谱系链(依赖性串)具有用于计算其数据的功能,并具有一个指针(依赖)到其父RDD。星火将分RDD依赖关系的阶段和任务,并把这些工人执行。

RDD transformations allow you to create dependencies between RDDs. Dependencies are only steps for producing results (a program). Each RDD in lineage chain (string of dependencies) has a function for calculating its data and has a pointer (dependency) to its parent RDD. Spark will divide RDD dependencies into stages and tasks and send those to workers for execution.

所以,如果你这样做:

val lines = sc.textFile("...")
val words = lines.flatMap(line => line.split(" "))
val localwords = words.collect()

话会包含对线路RDD参考的RDD。当执行程序时,第一行功能将被执行(负载从文本文件中的数据),则词语功能将在所得到的数据(分割线成单词)执行。 Spark是懒惰的,所以除非你调用一些改造或行动将触发创造就业机会和执行(在这个例子中收集)什么也不会得到执行。

words will be an RDD containing a reference to lines RDD. When the program is executed, first lines' function will be executed (load the data from a text file), then words' function will be executed on the resulting data (split lines into words). Spark is lazy, so nothing will get executed unless you call some transformation or action that will trigger job creation and execution (collect in this example).

所以,一个RDD(转化RDD,太)不是一组数据,但在程序的步骤(可能是唯一的步骤)告诉星火如何获取数据,并用它做什么。

So, an RDD (transformed RDD, too) is not 'a set of data', but a step in a program (might be the only step) telling Spark how to get the data and what to do with it.

这篇关于什么是Spark中RDD转换的结果?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆