从Spark SQL中的字符串列表创建文字和列的数组 [英] Create array of literals and columns from List of Strings in Spark SQL
问题描述
我正在尝试在Scala中定义将字符串列表作为输入的函数,并将其转换为传递给以下代码中使用的dataframe数组参数的列.
I am trying to define functions in Scala that take a list of strings as input, and converts them into the columns passed to the dataframe array arguments used in the code below.
val df = sc.parallelize(Array((1,1),(2,2),(3,3))).toDF("foo","bar")
val df2 = df
.withColumn("columnArray",array(df("foo").cast("String"),df("bar").cast("String")))
.withColumn("litArray",array(lit("foo"),lit("bar")))
更具体地说,我想创建将字符串列表作为输入参数的函数colFunction
和litFunction
(或者如果可能的话,仅创建一个函数),可以如下使用:
More specifically, I would like to create functions colFunction
and litFunction
(or just one function if possible) that takes a list of strings as an input parameter and can be used as follows:
val df = sc.parallelize(Array((1,1),(2,2),(3,3))).toDF("foo","bar")
val colString = List("foo","bar")
val df2 = df
.withColumn("columnArray",array(colFunction(colString))
.withColumn("litArray",array(litFunction(colString)))
我尝试将colString
映射到具有所有转换的列数组,但这不起作用.关于如何实现的任何想法?非常感谢您阅读问题以及提出任何建议/解决方案.
I have tried mapping the colString
to an Array of columns with all the transformations but this doesn't work. Any ideas on how this can be achieved? Many thanks for reading the question, and for any suggestions/solutions.
推荐答案
Spark 2.2 + :
Seq,Map
和Tuple
(struct
)文字的支持. ="noreferrer"> SPARK-19254 .根据
Support for Seq
, Map
and Tuple
(struct
) literals has been added in SPARK-19254. According to tests:
import org.apache.spark.sql.functions.typedLit
typedLit(Seq("foo", "bar"))
火花< 2.2
只需map
与lit
并用array
包装:
def asLitArray[T](xs: Seq[T]) = array(xs map lit: _*)
df.withColumn("an_array", asLitArray(colString)).show
// +---+---+----------+
// |foo|bar| an_array|
// +---+---+----------+
// | 1| 1|[foo, bar]|
// | 2| 2|[foo, bar]|
// | 3| 3|[foo, bar]|
// +---+---+----------+
关于从Seq[String]
到Array
类型的Column
的转换,此功能已由以下人员提供:
Regarding transformation from Seq[String]
to Column
of type Array
this functionality is already provided by:
def array(colName: String, colNames: String*): Column
或
def array(cols: Column*): Column
示例:
val cols = Seq("bar", "foo")
cols match { case x::xs => df.select(array(x, xs:_*))
// or
df.select(array(cols map col: _*))
当然,所有列都必须具有相同的类型.
Of course all columns have to be of the same type.
这篇关于从Spark SQL中的字符串列表创建文字和列的数组的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!