从Spark-Scala UDF返回Seq [Row] [英] Return Seq[Row] from Spark-Scala UDF

查看:351
本文介绍了从Spark-Scala UDF返回Seq [Row]的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在将Spark与Scala一起使用来进行一些数据处理.我有XML数据映射到数据框.我将Row作为参数传递给UDF,并尝试提取两个复杂类型的对象作为列表. Spark给了我以下错误:

I am using Spark with Scala to do some data processing. I have XML data mapped to dataframe. I am passing a Row as parameter to the UDF and trying to extract two complex types objects as a list. Spark is giving me following error:

线程"main"中的异常java.lang.UnsupportedOperationException:不支持org.apache.spark.sql.Row类型的架构

Exception in thread "main" java.lang.UnsupportedOperationException: Schema for type org.apache.spark.sql.Row is not supported

def testUdf = udf((testInput: Row) => {
  val firstObject = testInput.getAs[Row]("Object1")
  val secondObject = testInput.getAs[Row]("Object2")
  val returnObject = Seq[firstObject,secondObject]

  returnObject
})

能否请你告诉我我做错了什么.谢谢.

Could you please tell me what I am doing wrong. Thanks.

推荐答案

UDF无法返回Row对象.返回类型必须是数据类型表.

UDF cannot return Row objects. Return type has to be one of the types enumerated in the column Value type in Scala in the Data Types table.

好消息是这里不需要UDF.如果Object1Object2 具有相同的架构(否则将无法正常工作),则可以使用array函数:

Good news is there should be no need for UDF here. If Object1 and Object2 have the same schema (it wouldn't work otherwise anyway) you can use array function:

import org.apache.spark.sql.functions._

df.select(array(col("Object1"), col("Object2"))

df.select(array(col("path.to.Object1"), col("path.to.Object2"))

如果Object1Object2不是顶级列.

这篇关于从Spark-Scala UDF返回Seq [Row]的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆