火花错误:构造ClassDict的预期零参数(对于numpy.core.multiarray._reconstruct) [英] Spark Error:expected zero arguments for construction of ClassDict (for numpy.core.multiarray._reconstruct)
问题描述
我在Spark中有一个数据框,其中的一列包含一个数组.现在,我编写了一个单独的UDF,该UDF将数组转换为仅包含不同值的另一个数组.参见下面的示例:
I have a dataframe in Spark in which one of the columns contains an array.Now,I have written a separate UDF which converts the array to another array with distinct values in it only. See example below:
例如: [24,23,27,23] 应该转换为 [24,23,27] 代码:
Ex: [24,23,27,23] should get converted to [24, 23, 27] Code:
def uniq_array(col_array):
x = np.unique(col_array)
return x
uniq_array_udf = udf(uniq_array,ArrayType(IntegerType()))
Df3 = Df2.withColumn("age_array_unique",uniq_array_udf(Df2.age_array))
在上面的代码中,Df2.age_array
是我要在其上应用UDF以获得另一个列"age_array_unique"
的数组,该列应在该数组中仅包含唯一值.
In the above code, Df2.age_array
is the array on which I am applying the UDF to get a different column "age_array_unique"
which should contain only unique values in the array.
但是,一旦我运行命令Df3.show()
,我就会收到错误消息:
However, as soon as I run the command Df3.show()
, I get the error:
net.razorvine.pickle.PickleException:用于构造ClassDict的预期零参数(对于numpy.core.multiarray._reconstruct)
net.razorvine.pickle.PickleException: expected zero arguments for construction of ClassDict (for numpy.core.multiarray._reconstruct)
任何人都可以让我知道为什么会这样吗?
Can anyone please let me know why this is happening?
谢谢!
推荐答案
问题的根源是从UDF返回的对象不符合声明的类型. np.unique
不仅返回numpy.ndarray
,还将数字转换为相应的NumPy
类型与DataFrame
API不兼容.您可以尝试这样的事情:
The source of the problem is that object returned from the UDF doesn't conform to the declared type. np.unique
not only returns numpy.ndarray
but also converts numerics to the corresponding NumPy
types which are not compatible with DataFrame
API. You can try something like this:
udf(lambda x: list(set(x)), ArrayType(IntegerType()))
或这个(保持秩序)
udf(lambda xs: list(OrderedDict((x, None) for x in xs)),
ArrayType(IntegerType()))
相反.
如果您真的想要np.unique
,则必须转换输出:
If you really want np.unique
you have to convert the output:
udf(lambda x: np.unique(x).tolist(), ArrayType(IntegerType()))
这篇关于火花错误:构造ClassDict的预期零参数(对于numpy.core.multiarray._reconstruct)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!