火花中的RDD是什么？ [英] What is RDD in spark

查看：142 发布时间：2018/5/31 18:32:58 scala hadoop apache-spark rdd

本文介绍了火花中的RDD是什么？的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

定义说：

RDD是不可变的分布式对象集合

我不太明白这是什么意思。它是否像存储在硬盘上的数据（分区对象）如果是这样，那么RDD如何具有用户定义的类（如java，scala或python）？

从这个链接： https://www.safaribooksonline.com/library/view /learning-spark/9781449359034/ch03.html 它提到：

用户通过两种方式创建RDD：通过加载外部数据集或
在其
驱动程序中分发对象集合（例如，列表或集合）

我对RDD的总体理解以及spark和hadoop的理解非常困惑。

有人可以帮忙。

解决方案

RDD本质上是一组数据的Spark表示，它分布在多台机器上，并带有API让您可以对其执行操作。 RDD可以来自任何数据源，例如文本文件，通过JDBC的数据库等。

正式的定义是：

lockquote
< RDD是容错的并行数据结构，可让用户
明确地在内存中保留中间结果，控制其
分区以优化数据放置，并使用
丰富的操作符来操作它们。

如果您想了解RDD的详细信息，请阅读其中一个核心Spark学术论文弹性分布式数据集：用于内存集群计算的容错抽象

Definition says:

RDD is immutable distributed collection of objects

I don't quite understand what does it mean. Is it like data (partitioned objects) stored on hard disk If so then how come RDD's can have user-defined classes (Such as java, scala or python)

From this link: https://www.safaribooksonline.com/library/view/learning-spark/9781449359034/ch03.html It mentions:

Users create RDDs in two ways: by loading an external dataset, or by distributing a collection of objects (e.g., a list or set) in their driver program

I am really confused understanding RDD in general and in relation to spark and hadoop.

Can some one please help.
解决方案
An RDD is, essentially, the Spark representation of a set of data, spread across multiple machines, with APIs to let you act on it. An RDD could come from any datasource, e.g. text files, a database via JDBC, etc.

The formal definition is:

RDDs are fault-tolerant, parallel data structures that let users explicitly persist intermediate results in memory, control their partitioning to optimize data placement, and manipulate them using a rich set of operators.

If you want the full details on what an RDD is, read one of the core Spark academic papers, Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing

这篇关于火花中的RDD是什么？的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

火花中的RDD是什么？ [英] What is RDD in spark

问题描述

相关文章

分布式计算/Hadoop最新文章

热门教程

热门工具

登录关闭

火花中的RDD是什么？ [英] What is RDD in spark

问题描述

相关文章

分布式计算/Hadoop最新文章

热门教程

热门工具

登录 关闭

登录关闭