是否有一个Spark内置组件可以平整嵌套数组? [英] Is there a Spark built-in that flattens nested arrays?
本文介绍了是否有一个Spark内置组件可以平整嵌套数组?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我有一个Seq[Seq[String]]
的DataFrame字段,我构建了一个UDF来将所述列转换为Seq [String]的列;基本上是Scala中flatten
函数的UDF.
I have a DataFrame field that is a Seq[Seq[String]]
I built a UDF to transform said column into a column of Seq[String]; basically, a UDF for the flatten
function from Scala.
def combineSentences(inCol: String, outCol: String): DataFrame => DataFrame = {
def flatfunc(seqOfSeq: Seq[Seq[String]]): Seq[String] = seqOfSeq match {
case null => Seq.empty[String]
case _ => seqOfSeq.flatten
}
df: DataFrame => df.withColumn(outCol, udf(flatfunc _).apply(col(inCol)))
}
我的用例是字符串,但是显然,这可能是通用的.您可以在一系列DataFrame转换中使用此功能,例如:
My use case is strings, but obviously, this could be generic. You can use this function in a chain of DataFrame transforms like:
df.transform(combineSentences(inCol, outCol))
Spark内置功能是否具有相同功能?我一直找不到.
Is there a Spark built-in function that does the same thing? I have not been able to find one.
推荐答案
有一个类似的函数(自Spark 2.4起),它被称为flatten
:
There is a similar function (since Spark 2.4) and it is called flatten
:
import org.apache.spark.sql.functions.flatten
查看全文