包装客户端code传递到RDD [英] Packaging client side code to pass to an RDD

查看:137
本文介绍了包装客户端code传递到RDD的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

Scala的函数传递到 rdd.map()。其中的逻辑是太复杂,包括函数本身之内,而不是逻辑封装斯卡拉对象之内。的对象是其实例的火花上下文,如在下面的例子中应用程序的一部分:

A scala function is passed into rdd.map(). The logic is too complex to be included within the function itself, and instead the logic is encapsulated within a scala object. The object is part of the application which instantiates the Spark context, as in the following example:

def func(s.String) = {
   // LogicEngine is object which, given a string, returns a different string
   LogicEngine.process(s) 
}

val sc = new SparkContext(config)

val rdd = sc.textFile("…")

val rdd2 = rdd.map(func)

现在的问题是,什么是要做到这一点,使 LogicEngine 本身传递给其上RDD本身正在处理(因此它的生命节点的正确方法连同传递给RDD功能code),而不是坐在客户端上?

The question is, what is the correct way to do this so that LogicEngine is itself passed to the nodes on which the rdd itself is being processed (so that it lives together with the function code passed to the rdd), rather than sitting on the client?

感谢

推荐答案

这就是你有什么了。每个节点会实例化并使用其自身 LogicEngine 复制时,它的第一次访问。

That's what you have already. Each node will instantiate and use its own copy of LogicEngine when it's first accessed.

这篇关于包装客户端code传递到RDD的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆