没有属性错误将广播变量从 PySpark 传递到 Java 函数 [英] No attribute error passing broadcast variable from PySpark to Java function

查看:100
本文介绍了没有属性错误将广播变量从 PySpark 传递到 Java 函数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在 PySpark 中注册了一个 java 类,我试图将一个 Broadcast 变量从 PySpark 传递给这个类中的一个方法.像这样:

I have a java class registered in PySpark, and Im trying to pass a Broadcast variable from PySpark to a method in this class. Like so:

from py4j.java_gateway import java_import
java_import(spark.sparkContext._jvm, "net.a.b.c.MyClass")
myPythonGateway = spark.sparkContext._jvm.MyClass()

with open("tests/fixtures/file.txt", "rb") as binary_file:
    data = spark.sparkContext.broadcast(binary_file.read())
    myPythonGateway.setData(data)

但这是在扔:

AttributeError: 'Broadcast' 对象没有属性 '_get_object_id'

AttributeError: 'Broadcast' object has no attribute '_get_object_id'

但是,如果我直接传递 byte[],而不将其包装在 broadcast() 中,则它可以正常工作.但是我需要广播这个变量,因为它会被重复使用.

However, if I pass the byte[] directly, without wrapping it in broadcast(), it works fine. But I need this variable to be broadcast, as it will be used repeatedly.

推荐答案

根据py4j docs,如果您尝试将 Python 集合传递给需要 Java 集合的方法,则会抛出上述错误.文档给出了以下解决方案:

According to the py4j docs, the above error will be thrown if you try to pass a Python collection to a method that expects a Java collection. The docs give the following solution:

您可以使用位于 py4j.java_collections 模块中的以下转换器之一显式转换 Python 集合:SetConverter、MapConverter、ListConverter.

You can explicitly convert Python collections using one of the following converter located in the py4j.java_collections module: SetConverter, MapConverter, ListConverter.

那里还提供了一个示例.

An example is provided there also.

据推测,这个错误是在 py4j 尝试转换 Broadcast 对象的 value 属性时发生的,因此转换它可能会解决问题,例如

Presumably, this error is occurring when py4j tries to convert the value attribute of the Broadcast object, so converting this may fix the problem e.g.

converted_data = ListConverter().convert(binary_file.read(),spark.sparkContext._jvm._gateway_client)
broadcast_data = spark.sparkContext.broadcast(converted_data)
myPythonGateway.setData(broadcast_data)

这篇关于没有属性错误将广播变量从 PySpark 传递到 Java 函数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆