如何在PySpark中创建一个udf，该udf返回一个字符串数组? [英] How to create a udf in PySpark which returns an array of strings?

查看：798 发布时间：2020/9/4 5:58:39 python apache-spark pyspark apache-spark-sql user-defined-functions

本文介绍了如何在PySpark中创建一个udf，该udf返回一个字符串数组?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一个udf，它返回一个字符串列表.这应该不太难.我在执行udf时传入数据类型，因为它返回字符串数组:ArrayType(StringType).

I have a udf which returns a list of strings. this should not be too hard. I pass in the datatype when executing the udf since it returns an array of strings: ArrayType(StringType).

现在，以某种方式这不起作用:

Now, somehow this is not working:

我正在处理的数据框是df_subsets_concat，看起来像这样:

the dataframe i'm operating on is df_subsets_concat and looks like this:

df_subsets_concat.show(3,False)

+----------------------+
|col1                  |
+----------------------+
|oculunt               |
|predistposed          |
|incredulous           |
+----------------------+
only showing top 3 rows

代码是

from pyspark.sql.types import ArrayType, FloatType, StringType

my_udf = lambda domain: ['s','n']
label_udf = udf(my_udf, ArrayType(StringType))
df_subsets_concat_with_md = df_subsets_concat.withColumn('subset', label_udf(df_subsets_concat.col1))

结果是

/usr/lib/spark/python/pyspark/sql/types.py in __init__(self, elementType, containsNull)
    288         False
    289         """
--> 290         assert isinstance(elementType, DataType), "elementType should be DataType"
    291         self.elementType = elementType
    292         self.containsNull = containsNull

AssertionError: elementType should be DataType

据我了解，这是执行此操作的正确方法.以下是一些资源: pySpark数据帧" assert isinstance(dataType，DataType)，" ; dataType应该为DataType" 如何返回元组类型"在PySpark的UDF中?

It is my understanding that this was the correct way to do this. Here are some resources: pySpark Data Frames "assert isinstance(dataType, DataType), "dataType should be DataType" How to return a "Tuple type" in a UDF in PySpark?

但是这些都没有帮助我解决为什么这行不通的问题.我正在使用pyspark 1.6.1.

But neither of these have helped me resolve why this is not working. i am using pyspark 1.6.1.

如何在pyspark中创建udf并返回字符串数组?

How to create a udf in pyspark which returns an array of strings?

如何在PySpark中创建一个udf，该udf返回一个字符串数组? [英] How to create a udf in PySpark which returns an array of strings?

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

如何在PySpark中创建一个udf，该udf返回一个字符串数组? [英] How to create a udf in PySpark which returns an array of strings?

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭