什么是一个RDD分区和片之间的区别? [英] What is the difference between an RDD partition and a slice?

查看:1888
本文介绍了什么是一个RDD分区和片之间的区别?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

借助星火编程指南提到切片作为RDDS(包括平行集合的功能或Hadoop的数据集。)(火花将运行一个任务的集群的每个片。)但是下RDD持久的部分中,而不引入使用分区的概念。此外, RDD文档只提及分区没有提到片,而<一href=\"http://spark.apache.org/docs/latest/api/core/index.html#org.apache.spark.SparkContext\">SparkContext文档提到切片创建RDDS,但对于RDDS运行作业分区。有这两个概念一样吗?如果没有,他们如何不同?

The Spark Programming Guide mentions slices as a feature of RDDs (both parallel collections or Hadoop datasets.) ("Spark will run one task for each slice of the cluster.") But under the section on RDD persistence, the concept of partitions is used without introduction. Also, the RDD docs only mention partitions with no mention of slices, while the SparkContext docs mentions slices for creating RDDs, but partitions for running jobs on RDDs. Are these two concepts the same? If not, how do they differ?

调整 - 平行的水平指出,星火自动设置数的地图任务,以根据其大小对每个文件运行...和分布式减少操作,如groupByKey和reduceByKey,它使用分区的最大父RDD的号码,可以通过并行的水平作为第二个参数......那么,这是否说明分区和片之间的区别?分区相关RDD存储和切片相关的并行度,并默认接头是基于计算出的任一数据大小或分区的数目在

Tuning - Level of Parallelism indicates that "Spark automatically sets the number of "map" tasks to run on each file according to its size ... and for distributed "reduce" operations, such as groupByKey and reduceByKey, it uses the largest parent RDD’s number of partitions. You can pass the level of parallelism as a second argument...." So does this explain the difference between partitions and slices? Partitions are related to RDD storage and slices are related to degree of parallelism, and by default splices are calculated based upon either data size or number of partitions?

推荐答案

他们是同样的事情。该文档已定为1.2星火感谢马修Farrellee。在错误的更多细节:<一href=\"https://issues.apache.org/jira/browse/SPARK-1701\">https://issues.apache.org/jira/browse/SPARK-1701

They are the same thing. The documentation has been fixed for Spark 1.2 thanks to Matthew Farrellee. More details in the bug: https://issues.apache.org/jira/browse/SPARK-1701

这篇关于什么是一个RDD分区和片之间的区别?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆