Spark 错误:用于构建 ClassDict 的预期零参数(对于 numpy.core.multiarray._reconstruct) [英] Spark Error:expected zero arguments for construction of ClassDict (for numpy.core.multiarray._reconstruct)
问题描述
我在 Spark 中有一个数据框,其中一列包含一个数组.现在,我编写了一个单独的 UDF,它将数组转换为另一个仅包含不同值的数组.请参见下面的示例:
I have a dataframe in Spark in which one of the columns contains an array.Now,I have written a separate UDF which converts the array to another array with distinct values in it only. See example below:
例如:[24,23,27,23] 应该转换为 [24,23,27]代码:
Ex: [24,23,27,23] should get converted to [24, 23, 27] Code:
def uniq_array(col_array):
x = np.unique(col_array)
return x
uniq_array_udf = udf(uniq_array,ArrayType(IntegerType()))
Df3 = Df2.withColumn("age_array_unique",uniq_array_udf(Df2.age_array))
在上面的代码中,Df2.age_array
是我在其上应用 UDF 以获得不同列 "age_array_unique"
的数组,该列应仅包含数组.
In the above code, Df2.age_array
is the array on which I am applying the UDF to get a different column "age_array_unique"
which should contain only unique values in the array.
但是,只要我运行命令 Df3.show()
,就会出现错误:
However, as soon as I run the command Df3.show()
, I get the error:
net.razorvine.pickle.PickleException: ClassDict 的构造期望零参数(对于 numpy.core.multiarray._reconstruct)
net.razorvine.pickle.PickleException: expected zero arguments for construction of ClassDict (for numpy.core.multiarray._reconstruct)
谁能告诉我为什么会这样?
Can anyone please let me know why this is happening?
谢谢!
推荐答案
问题的根源在于从 UDF 返回的对象不符合声明的类型.np.unique
不仅返回 numpy.ndarray
,还将数字转换为相应的 NumPy
类型 与DataFrame
API 不兼容.你可以试试这样的:
The source of the problem is that object returned from the UDF doesn't conform to the declared type. np.unique
not only returns numpy.ndarray
but also converts numerics to the corresponding NumPy
types which are not compatible with DataFrame
API. You can try something like this:
udf(lambda x: list(set(x)), ArrayType(IntegerType()))
或者这个(保持秩序)
udf(lambda xs: list(OrderedDict((x, None) for x in xs)),
ArrayType(IntegerType()))
相反.
如果你真的想要 np.unique
你必须转换输出:
If you really want np.unique
you have to convert the output:
udf(lambda x: np.unique(x).tolist(), ArrayType(IntegerType()))
这篇关于Spark 错误:用于构建 ClassDict 的预期零参数(对于 numpy.core.multiarray._reconstruct)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!