Spark Scala数据帧udf返回行 [英] Spark scala data frame udf returning rows

查看：75 发布时间：2021/7/15 19:49:36 scala apache-spark user-defined-functions

本文介绍了Spark Scala数据帧udf返回行的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

假设我有一个数据框，其中包含一列(称为 colA)，它是一个 seq 的行.我想为 colA 的每条记录附加一个新字段.(而且新的文件与以前的记录相关联，所以我必须写一个 udf.)这个udf应该怎么写?

Say I have an dataframe which contains a column (called colA) which is a seq of row. I want to to append a new field to each record of colA. (And the new filed is associated with the former record, so I have to write an udf.) How should I write this udf?

我尝试编写一个 udf，它将 colA 作为输入，并输出 Seq[Row]，其中每条记录都包含新的字段.但问题是 udf 无法返回 Seq[Row]/异常是不支持类型 org.apache.spark.sql.Row 的架构".我该怎么办?

I have tried to write a udf, which takes colA as input, and output Seq[Row] where each record contains the new filed. But the problem is the udf cannot return Seq[Row]/ The exception is 'Schema for type org.apache.spark.sql.Row is not supported'. What should I do?

我写的udf:<代码>val convert = udf[Seq[Row], Seq[Row]](blablabla...)例外是 java.lang.UnsupportedOperationException: 不支持类型 org.apache.spark.sql.Row 的架构

The udf that I wrote: val convert = udf[Seq[Row], Seq[Row]](blablabla...) And the exception is java.lang.UnsupportedOperationException: Schema for type org.apache.spark.sql.Row is not supported

推荐答案

从 spark 2.0 开始，你可以创建返回 Row/Seq[Row] 的 UDF，但你必须提供返回类型的模式，例如如果您使用双打数组:

since spark 2.0 you can create UDFs which return Row / Seq[Row], but you must provide the schema for the return type, e.g. if you work with an Array of Doubles :

val schema = ArrayType(DoubleType)

val myUDF = udf((s: Seq[Row]) => {
  s // just pass data without modification
}, schema)

但我真的无法想象这有什么用处，我宁愿从 UDF 返回元组或案例类(或其中的 Seq).

But I cant really imagine where this is useful, I would rather return tuples or case classes (or Seq thereof) from the UDFs.

如果您的行包含超过 22 个字段(元组/案例类的字段限制)，这可能很有用

EDIT : It could be useful if your row contains more than 22 fields (limit of fields for tuples/case classes)

这篇关于Spark Scala数据帧udf返回行的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

Spark Scala数据帧udf返回行 [英] Spark scala data frame udf returning rows

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

Spark Scala数据帧udf返回行 [英] Spark scala data frame udf returning rows

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭