从 Spark-Scala UDF 返回 Seq[Row] [英] Return Seq[Row] from Spark-Scala UDF

查看:35
本文介绍了从 Spark-Scala UDF 返回 Seq[Row]的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用 Spark 和 Scala 进行一些数据处理.我将 XML 数据映射到数据帧.我将 Row 作为参数传递给 UDF 并尝试将两个复杂类型的对象提取为列表.Spark 给我以下错误:

I am using Spark with Scala to do some data processing. I have XML data mapped to dataframe. I am passing a Row as parameter to the UDF and trying to extract two complex types objects as a list. Spark is giving me following error:

线程main"中的异常java.lang.UnsupportedOperationException:不支持类型org.apache.spark.sql.Row的架构

Exception in thread "main" java.lang.UnsupportedOperationException: Schema for type org.apache.spark.sql.Row is not supported

def testUdf = udf((testInput: Row) => {
  val firstObject = testInput.getAs[Row]("Object1")
  val secondObject = testInput.getAs[Row]("Object2")
  val returnObject = Seq[firstObject,secondObject]

  returnObject
})

你能告诉我我做错了什么吗?谢谢.

Could you please tell me what I am doing wrong. Thanks.

推荐答案

UDF 不能返回 Row 对象.返回类型必须是 数据类型表.

UDF cannot return Row objects. Return type has to be one of the types enumerated in the column Value type in Scala in the Data Types table.

好消息是这里应该不需要 UDF.如果 Object1Object2 具有相同的架构(否则无论如何都无法工作),您可以使用 array 函数:

Good news is there should be no need for UDF here. If Object1 and Object2 have the same schema (it wouldn't work otherwise anyway) you can use array function:

import org.apache.spark.sql.functions._

df.select(array(col("Object1"), col("Object2"))

df.select(array(col("path.to.Object1"), col("path.to.Object2"))

如果 Object1Object2 不是顶级列.

if Object1 and Object2 are not top level columns.

这篇关于从 Spark-Scala UDF 返回 Seq[Row]的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆