Spark:广播变量:您似乎试图从广播变量、操作或转换中引用 SparkContext [英] Spark: Broadcast variables: It appears that you are attempting to reference SparkContext from a broadcast variable, action, or transforamtion

查看:30
本文介绍了Spark:广播变量:您似乎试图从广播变量、操作或转换中引用 SparkContext的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

Class ProdsTransformer:

    def __init__(self):  
      self.products_lookup_hmap = {}
      self.broadcast_products_lookup_map = None

    def create_broadcast_variables(self):
      self.broadcast_products_lookup_map = sc.broadcast(self.products_lookup_hmap)

    def create_lookup_maps(self):
    // The code here builds the hashmap that maps Prod_ID to another space.

pt = ProdsTransformer ()
pt.create_broadcast_variables()  

pairs = distinct_users_projected.map(lambda x: (x.user_id,    
                         pt.broadcast_products_lookup_map.value[x.Prod_ID]))

我收到以下错误:

"异常:您似乎正在尝试引用SparkContext 来自广播变量、动作或转换.SparkContext 只能在驱动程序上使用,不能在它运行的代码中使用在工人身上.有关详细信息,请参阅 SPARK-5063."

"Exception: It appears that you are attempting to reference SparkContext from a broadcast variable, action, or transforamtion. SparkContext can only be used on the driver, not in code that it run on workers. For more information, see SPARK-5063."

任何有关如何处理广播变量的帮助都会很棒!

Any help with how to deal with the broadcast variables will be great!

推荐答案

通过在 map lambda 中引用包含广播变量的对象,Spark 将尝试序列化整个对象并将其发送给工作人员.由于该对象包含对 SparkContext 的引用,因此您会收到错误消息.而不是这样:

By referencing the object containing your broadcast variable in your map lambda, Spark will attempt to serialize the whole object and ship it to workers. Since the object contains a reference to the SparkContext, you get the error. Instead of this:

pairs = distinct_users_projected.map(lambda x: (x.user_id, pt.broadcast_products_lookup_map.value[x.Prod_ID]))

试试这个:

bcast = pt.broadcast_products_lookup_map
pairs = distinct_users_projected.map(lambda x: (x.user_id, bcast.value[x.Prod_ID]))

后者避免了对对象的引用(pt),因此 Spark 只需要传送广播变量.

The latter avoids the reference to the object (pt) so that Spark only needs to ship the broadcast variable.

这篇关于Spark:广播变量:您似乎试图从广播变量、操作或转换中引用 SparkContext的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆