火花错误:构造ClassDict的预期零参数(对于numpy.core.multiarray._reconstruct) [英] Spark Error:expected zero arguments for construction of ClassDict (for numpy.core.multiarray._reconstruct)

查看:330
本文介绍了火花错误:构造ClassDict的预期零参数(对于numpy.core.multiarray._reconstruct)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在Spark中有一个数据框,其中的一列包含一个数组.现在,我编写了一个单独的UDF,该UDF将数组转换为仅包含不同值的另一个数组.参见下面的示例:

I have a dataframe in Spark in which one of the columns contains an array.Now,I have written a separate UDF which converts the array to another array with distinct values in it only. See example below:

例如: [24,23,27,23] 应该转换为 [24,23,27] 代码:

Ex: [24,23,27,23] should get converted to [24, 23, 27] Code:

def uniq_array(col_array):
    x = np.unique(col_array)
    return x
uniq_array_udf = udf(uniq_array,ArrayType(IntegerType()))

Df3 = Df2.withColumn("age_array_unique",uniq_array_udf(Df2.age_array))

在上面的代码中,Df2.age_array是我要在其上应用UDF以获得另一个列"age_array_unique"的数组,该列应在该数组中仅包含唯一值.

In the above code, Df2.age_array is the array on which I am applying the UDF to get a different column "age_array_unique" which should contain only unique values in the array.

但是,一旦我运行命令Df3.show(),我就会收到错误消息:

However, as soon as I run the command Df3.show(), I get the error:

net.razorvine.pickle.PickleException:用于构造ClassDict的预期零参数(对于numpy.core.multiarray._reconstruct)

net.razorvine.pickle.PickleException: expected zero arguments for construction of ClassDict (for numpy.core.multiarray._reconstruct)

任何人都可以让我知道为什么会这样吗?

Can anyone please let me know why this is happening?

谢谢!

推荐答案

问题的根源是从UDF返回的对象不符合声明的类型. np.unique不仅返回numpy.ndarray,还将数字转换为相应的NumPy类型DataFrame API不兼容.您可以尝试这样的事情:

The source of the problem is that object returned from the UDF doesn't conform to the declared type. np.unique not only returns numpy.ndarray but also converts numerics to the corresponding NumPy types which are not compatible with DataFrame API. You can try something like this:

udf(lambda x: list(set(x)), ArrayType(IntegerType()))

或这个(保持秩序)

udf(lambda xs: list(OrderedDict((x, None) for x in xs)), 
    ArrayType(IntegerType()))

相反.

如果您真的想要np.unique,则必须转换输出:

If you really want np.unique you have to convert the output:

udf(lambda x: np.unique(x).tolist(), ArrayType(IntegerType()))

这篇关于火花错误:构造ClassDict的预期零参数(对于numpy.core.multiarray._reconstruct)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆