如何将字符串数组的列转换为字符串? [英] How to convert column of arrays of strings to strings?

查看:124
本文介绍了如何将字符串数组的列转换为字符串?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个列,在火花表中该列的类型为array < string >.我正在使用SQL查询这些火花表.我想将array < string >转换为string.

I have a column, which is of type array < string > in spark tables. I am using SQL to query these spark tables. I wanted to convert the array < string > into string.

使用以下语法时:

select cast(rate_plan_code  as string) as new_rate_plan  from
customer_activity_searches group by rate_plan_code

rate_plan_code列具有以下值:

["AAA","RACK","SMOBIX","SMOBPX"] 
["LPCT","RACK"]
["LFTIN","RACK","SMOBIX","SMOBPX"]
["LTGD","RACK"] 
["RACK","LEARLI","NHDP","LADV","LADV2"]

以下内容填充在new_rate_plan列中:

org.apache.spark.sql.catalyst.expressions.UnsafeArrayData@e4273d9f
org.apache.spark.sql.catalyst.expressions.UnsafeArrayData@c1ade2ff
org.apache.spark.sql.catalyst.expressions.UnsafeArrayData@4f378397
org.apache.spark.sql.catalyst.expressions.UnsafeArrayData@d1c81377
org.apache.spark.sql.catalyst.expressions.UnsafeArrayData@552f3317

当我将decimal转换为intint转换为double时,

投射似乎起作用,但是在这种情况下不起作用.好奇为什么演员表不能在这里工作. 非常感谢您的帮助.

Cast seem to work when I am converting decimal to int or int to double, but not in this case. Curious why the cast is not not working here. Greatly appreciate your help.

推荐答案

在Spark 2.1+中,要在单个Array列中进行值的串联,可以使用以下命令:

In Spark 2.1+ to do the concatenation of the values in a single Array column you can use the following:

  1. concat_ws标准功能
  2. map运算符
  3. 用户定义的函数(UDF)
  1. concat_ws standard function
  2. map operator
  3. a user-defined function (UDF)

concat_ws标准功能

使用

concat_ws(sep:字符串,exprs:列*):列使用给定的分隔符将多个输入字符串列连接在一起成为单个字符串列.

concat_ws(sep: String, exprs: Column*): Column Concatenates multiple input string columns together into a single string column, using the given separator.

val solution = words.withColumn("codes", concat_ws(" ", $"rate_plan_code"))
scala> solution.show
+--------------+-----------+
|         words|      codes|
+--------------+-----------+
|[hello, world]|hello world|
+--------------+-----------+

地图运算符

使用

map Operator

Use map operator to have full control of what and how should be transformed.

map [U](func:(T)⇒U):数据集[U] 返回一个新的数据集,其中包含对每个元素应用func的结果.

map[U](func: (T) ⇒ U): Dataset[U] Returns a new Dataset that contains the result of applying func to each element.

scala> codes.show(false)
+---+---------------------------+
|id |rate_plan_code             |
+---+---------------------------+
|0  |[AAA, RACK, SMOBIX, SMOBPX]|
+---+---------------------------+

val codesAsSingleString = codes.as[(Long, Array[String])]
  .map { case (id, codes) => (id, codes.mkString(", ")) }
  .toDF("id", "codes")

scala> codesAsSingleString.show(false)
+---+-------------------------+
|id |codes                    |
+---+-------------------------+
|0  |AAA, RACK, SMOBIX, SMOBPX|
+---+-------------------------+

scala> codesAsSingleString.printSchema
root
 |-- id: long (nullable = false)
 |-- codes: string (nullable = true)

这篇关于如何将字符串数组的列转换为字符串?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆