如何返回“元组类型"?在PySpark的UDF中? [英] How to return a "Tuple type" in a UDF in PySpark?

查看：464 发布时间：2020/9/4 2:48:45 python apache-spark dataframe pyspark apache-spark-sql

本文介绍了如何返回“元组类型"?在PySpark的UDF中?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

pyspark.sql.types中的所有数据类型是:

__all__ = [
    "DataType", "NullType", "StringType", "BinaryType", "BooleanType", "DateType",
    "TimestampType", "DecimalType", "DoubleType", "FloatType", "ByteType", "IntegerType",
    "LongType", "ShortType", "ArrayType", "MapType", "StructField", "StructType"]

我必须写一个UDF(在pyspark中)，它返回一个元组数组.我应该给它第二个参数是udf方法的返回类型吗? ArrayType(TupleType()) ...

I have to write a UDF (in pyspark) which returns an array of tuples. What do I give the second argument to it which is the return type of the udf method? It would be something on the lines of ArrayType(TupleType())...

推荐答案

Spark中没有TupleType这样的东西.产品类型用特定类型的字段表示为structs.例如，如果您想返回一个成对的数组(整数，字符串)，则可以使用如下模式:

There is no such thing as a TupleType in Spark. Product types are represented as structs with fields of specific type. For example if you want to return an array of pairs (integer, string) you can use schema like this:

from pyspark.sql.types import *

schema = ArrayType(StructType([
    StructField("char", StringType(), False),
    StructField("count", IntegerType(), False)
]))

示例用法:

from pyspark.sql.functions import udf
from collections import Counter

char_count_udf = udf(
    lambda s: Counter(s).most_common(),
    schema
)

df = sc.parallelize([(1, "foo"), (2, "bar")]).toDF(["id", "value"])

df.select("*", char_count_udf(df["value"])).show(2, False)

## +---+-----+-------------------------+
## |id |value|PythonUDF#<lambda>(value)|
## +---+-----+-------------------------+
## |1  |foo  |[[o,2], [f,1]]           |
## |2  |bar  |[[r,1], [a,1], [b,1]]    |
## +---+-----+-------------------------+

这篇关于如何返回“元组类型"?在PySpark的UDF中?的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

如何返回“元组类型"?在PySpark的UDF中? [英] How to return a "Tuple type" in a UDF in PySpark?

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

如何返回“元组类型"?在PySpark的UDF中? [英] How to return a &quot;Tuple type&quot; in a UDF in PySpark?

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

如何返回“元组类型"?在PySpark的UDF中? [英] How to return a "Tuple type" in a UDF in PySpark?

登录关闭