Apache Spark:地图与地图分区? [英] Apache Spark: map vs mapPartitions?

查看:32
本文介绍了Apache Spark:地图与地图分区?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

RDD mapmapPartitions 方法?flatMap 表现得像 map 还是像 mapPartitions?谢谢.

What's the difference between an RDD's map and mapPartitions method? And does flatMap behave like map or like mapPartitions? Thanks.

(编辑)即

  def map[A, B](rdd: RDD[A], fn: (A => B))
               (implicit a: Manifest[A], b: Manifest[B]): RDD[B] = {
    rdd.mapPartitions({ iter: Iterator[A] => for (i <- iter) yield fn(i) },
      preservesPartitioning = true)
  }

还有:

  def map[A, B](rdd: RDD[A], fn: (A => B))
               (implicit a: Manifest[A], b: Manifest[B]): RDD[B] = {
    rdd.map(fn)
  }

推荐答案

RDD 的 map 和 mapPartitions 方法有什么区别?

What's the difference between an RDD's map and mapPartitions method?

方法map 通过应用函数将源 RDD 的每个 元素 转换为结果 RDD 的单个元素.mapPartitions 转换每个将源RDD的分区分成多个结果元素(可能没有).

The method map converts each element of the source RDD into a single element of the result RDD by applying a function. mapPartitions converts each partition of the source RDD into multiple elements of the result (possibly none).

flatMap 的行为是像 map 还是像 mapPartitions?

And does flatMap behave like map or like mapPartitions?

都不是,flatMap 作用于单个元素(如 map)并产生结果的多个元素(如 mapPartitions).

Neither, flatMap works on a single element (as map) and produces multiple elements of the result (as mapPartitions).

这篇关于Apache Spark:地图与地图分区?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆