从Spark-Scala UDF返回Seq [Row] [英] Return Seq[Row] from Spark-Scala UDF
问题描述
我正在将Spark与Scala一起使用来进行一些数据处理.我有XML数据映射到数据框.我将Row作为参数传递给UDF,并尝试提取两个复杂类型的对象作为列表. Spark给了我以下错误:
I am using Spark with Scala to do some data processing. I have XML data mapped to dataframe. I am passing a Row as parameter to the UDF and trying to extract two complex types objects as a list. Spark is giving me following error:
线程"main"中的异常java.lang.UnsupportedOperationException:不支持org.apache.spark.sql.Row类型的架构
Exception in thread "main" java.lang.UnsupportedOperationException: Schema for type org.apache.spark.sql.Row is not supported
def testUdf = udf((testInput: Row) => {
val firstObject = testInput.getAs[Row]("Object1")
val secondObject = testInput.getAs[Row]("Object2")
val returnObject = Seq[firstObject,secondObject]
returnObject
})
能否请你告诉我我做错了什么.谢谢.
Could you please tell me what I am doing wrong. Thanks.
推荐答案
UDF无法返回Row
对象.返回类型必须是数据类型表.
UDF cannot return Row
objects. Return type has to be one of the types enumerated in the column Value type in Scala in the Data Types table.
好消息是这里不需要UDF.如果Object1
和Object2
具有相同的架构(否则将无法正常工作),则可以使用array
函数:
Good news is there should be no need for UDF here. If Object1
and Object2
have the same schema (it wouldn't work otherwise anyway) you can use array
function:
import org.apache.spark.sql.functions._
df.select(array(col("Object1"), col("Object2"))
或
df.select(array(col("path.to.Object1"), col("path.to.Object2"))
如果Object1
和Object2
不是顶级列.
这篇关于从Spark-Scala UDF返回Seq [Row]的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!