传递函数星火:什么是引用整个对象的风险? [英] Passing Functions to Spark: What is the risk of referencing the whole object?
问题描述
据<一个href=\"https://spark.apache.org/docs/latest/programming-guide.html#resilient-distributed-datasets-rdds\"相对=nofollow>传递功能星火,它声称:
访问外部对象将引用整个对象的领域;为了避免这个问题...
class MyClass {
val field = "Hello"
def doStuff(rdd: RDD[String]): RDD[String] = { rdd.map(x => field + x) }
}
引用这一切会做任何伤害?
推荐答案
这会导致火花,序列化整个对象,并将其发送到每一个执行者。如果某些对象的字段包含大数据量,它可能是缓慢的。此外,如果你的对象不是序列化可能会引起任务不可序列
例外
This will cause Spark to serialize your whole object and send it to each of the executors. If some of the fields of your object contain big amounts of data, it might be slow. Also it might cause task not serializable
exception if your object is not serializable
下面是这个问题的家伙一个例子:<一href=\"http://stackoverflow.com/questions/22592811/task-not-serializable-java-io-notserializableexception-when-calling-function-ou\">Task只有在不类对象
Here's an example of the guy with this problem: Task not serializable: java.io.NotSerializableException when calling function outside closure only on classes not objects
这篇关于传递函数星火:什么是引用整个对象的风险?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!