从 Spark SQL 中的字符串列表创建文字和列数组 [英] Create array of literals and columns from List of Strings in Spark SQL

查看:36
本文介绍了从 Spark SQL 中的字符串列表创建文字和列数组的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图在 Scala 中定义函数,将字符串列表作为输入,并将它们转换为传递给下面代码中使用的数据帧数组参数的列.

I am trying to define functions in Scala that take a list of strings as input, and converts them into the columns passed to the dataframe array arguments used in the code below.

val df = sc.parallelize(Array((1,1),(2,2),(3,3))).toDF("foo","bar")
val df2 = df
        .withColumn("columnArray",array(df("foo").cast("String"),df("bar").cast("String")))
        .withColumn("litArray",array(lit("foo"),lit("bar")))

更具体地说,我想创建函数 colFunctionlitFunction(如果可能的话,也可以只创建一个函数),它们将字符串列表作为输入参数,并且可以是用法如下:

More specifically, I would like to create functions colFunction and litFunction (or just one function if possible) that takes a list of strings as an input parameter and can be used as follows:

val df = sc.parallelize(Array((1,1),(2,2),(3,3))).toDF("foo","bar")
val colString = List("foo","bar")
val df2 = df
         .withColumn("columnArray",array(colFunction(colString))
         .withColumn("litArray",array(litFunction(colString)))

我尝试将 colString 映射到包含所有转换的列数组,但这不起作用.关于如何实现这一目标的任何想法?非常感谢您阅读问题,以及任何建议/解决方案.

I have tried mapping the colString to an Array of columns with all the transformations but this doesn't work. Any ideas on how this can be achieved? Many thanks for reading the question, and for any suggestions/solutions.

推荐答案

Spark 2.2+:

Seq、MapTuple (struct) 文字的支持"https://issues.apache.org/jira/browse/SPARK-19254" rel="noreferrer">SPARK-19254.根据 测试:

Support for Seq, Map and Tuple (struct) literals has been added in SPARK-19254. According to tests:

import org.apache.spark.sql.functions.typedLit

typedLit(Seq("foo", "bar"))

火花<2.2

Just map with lit 并用 array 包裹:

Just map with lit and wrap with array:

def asLitArray[T](xs: Seq[T]) = array(xs map lit: _*)

df.withColumn("an_array", asLitArray(colString)).show
// +---+---+----------+
// |foo|bar|  an_array|
// +---+---+----------+
// |  1|  1|[foo, bar]|
// |  2|  2|[foo, bar]|
// |  3|  3|[foo, bar]|
// +---+---+----------+

关于从 Seq[String]Column 类型Array 的转换,这个功能已经提供:

Regarding transformation from Seq[String] to Column of type Array this functionality is already provided by:

def array(colName: String, colNames: String*): Column 

def array(cols: Column*): Column

示例:

val cols = Seq("bar", "foo")

cols match { case x::xs => df.select(array(x, xs:_*)) 
// or 
df.select(array(cols map col: _*))

当然,所有的列都必须是相同的类型.

Of course all columns have to be of the same type.

这篇关于从 Spark SQL 中的字符串列表创建文字和列数组的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆