Spark 错误:用于构建 ClassDict 的预期零参数(对于 numpy.core.multiarray._reconstruct) [英] Spark Error:expected zero arguments for construction of ClassDict (for numpy.core.multiarray._reconstruct)

查看:24
本文介绍了Spark 错误:用于构建 ClassDict 的预期零参数(对于 numpy.core.multiarray._reconstruct)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在 Spark 中有一个数据框,其中一列包含一个数组.现在,我编写了一个单独的 UDF,它将数组转换为另一个仅包含不同值的数组.请参见下面的示例:

I have a dataframe in Spark in which one of the columns contains an array.Now,I have written a separate UDF which converts the array to another array with distinct values in it only. See example below:

例如:[24,23,27,23] 应该转换为 [24,23,27]代码:

Ex: [24,23,27,23] should get converted to [24, 23, 27] Code:

def uniq_array(col_array):
    x = np.unique(col_array)
    return x
uniq_array_udf = udf(uniq_array,ArrayType(IntegerType()))

Df3 = Df2.withColumn("age_array_unique",uniq_array_udf(Df2.age_array))

在上面的代码中,Df2.age_array 是我在其上应用 UDF 以获得不同列 "age_array_unique" 的数组,该列应仅包含数组.

In the above code, Df2.age_array is the array on which I am applying the UDF to get a different column "age_array_unique" which should contain only unique values in the array.

但是,只要我运行命令 Df3.show(),就会出现错误:

However, as soon as I run the command Df3.show(), I get the error:

net.razorvine.pickle.PickleException: ClassDict 的构造期望零参数(对于 numpy.core.multiarray._reconstruct)

net.razorvine.pickle.PickleException: expected zero arguments for construction of ClassDict (for numpy.core.multiarray._reconstruct)

谁能告诉我为什么会这样?

Can anyone please let me know why this is happening?

谢谢!

推荐答案

问题的根源在于从 UDF 返回的对象不符合声明的类型.np.unique 不仅返回 numpy.ndarray,还将数字转换为相应的 NumPy 类型 DataFrame API 不兼容.你可以试试这样的:

The source of the problem is that object returned from the UDF doesn't conform to the declared type. np.unique not only returns numpy.ndarray but also converts numerics to the corresponding NumPy types which are not compatible with DataFrame API. You can try something like this:

udf(lambda x: list(set(x)), ArrayType(IntegerType()))

或者这个(保持秩序)

udf(lambda xs: list(OrderedDict((x, None) for x in xs)), 
    ArrayType(IntegerType()))

相反.

如果你真的想要 np.unique 你必须转换输出:

If you really want np.unique you have to convert the output:

udf(lambda x: np.unique(x).tolist(), ArrayType(IntegerType()))

这篇关于Spark 错误:用于构建 ClassDict 的预期零参数(对于 numpy.core.multiarray._reconstruct)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆