如何区分Spark中的操作是转换还是动作? [英] how to distinguish an operation in spark is a transformation or an action?
问题描述
我最近在学习火花,并对转换和动作操作感到困惑.我阅读了spark文档和一些有关spark的书,并且我知道操作会导致spark作业在集群中执行,而转换不会执行.但是未说明在spark的api文档中列出的rdd的操作是转换操作还是动作操作.
I'm learning spark recently and confused about the transformation and action operation. I read the spark document and some books about spark, and I know action will cause a spark job to be executed in the cluster while transformation will not. But the operations of rdd listed in spark's api doc are not stated whether it is a transformation or an action operation.
例如,reduce是一个动作,而reduceByKey是一个转换!为什么会这样.
For example, reduce is an action, on the other hand reduceByKey is a transformation! Why could this be.
推荐答案
您可以通过查看返回类型来判断.动作将返回非RDD类型(通常是您的存储值类型),而转换将返回RDD[Type]
,因为它仍只是计算的表示形式.
You can tell by looking at the return type. An action will return a non-RDD type (your stored value types usually), whereas a transformation will return an RDD[Type]
as it is still just a representation of your computation.
这篇关于如何区分Spark中的操作是转换还是动作?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!