如何将字符串数组的列转换为字符串? [英] How to convert column of arrays of strings to strings?
问题描述
我有一列,它的类型是 array <火花表中的字符串 >
.我正在使用 SQL 来查询这些火花表.我想转换 array <string >
到 string
.
I have a column, which is of type array < string >
in spark tables. I am using SQL to query these spark tables. I wanted to convert the array < string >
into string
.
使用以下语法时:
select cast(rate_plan_code as string) as new_rate_plan from
customer_activity_searches group by rate_plan_code
rate_plan_code
列具有以下值:
["AAA","RACK","SMOBIX","SMOBPX"]
["LPCT","RACK"]
["LFTIN","RACK","SMOBIX","SMOBPX"]
["LTGD","RACK"]
["RACK","LEARLI","NHDP","LADV","LADV2"]
在 new_rate_plan
列中填充以下内容:
following are populated in the new_rate_plan
column:
org.apache.spark.sql.catalyst.expressions.UnsafeArrayData@e4273d9f
org.apache.spark.sql.catalyst.expressions.UnsafeArrayData@c1ade2ff
org.apache.spark.sql.catalyst.expressions.UnsafeArrayData@4f378397
org.apache.spark.sql.catalyst.expressions.UnsafeArrayData@d1c81377
org.apache.spark.sql.catalyst.expressions.UnsafeArrayData@552f3317
当我将 decimal
转换为 int
或 int
到 double
时,
Cast 似乎有效,但在这个案例.很好奇为什么演员不在这里工作.非常感谢您的帮助.
Cast seem to work when I am converting decimal
to int
or int
to double
, but not in this case. Curious why the cast is not not working here.
Greatly appreciate your help.
推荐答案
在 Spark 2.1+ 中,您可以使用以下方法对单个 Array 列中的值进行串联:
In Spark 2.1+ to do the concatenation of the values in a single Array column you can use the following:
concat_ws
标准函数map
运算符- 用户定义函数 (UDF)
concat_ws 标准函数
使用 concat_ws 函数.
concat_ws(sep: String, exprs: Column*): Column 使用给定的分隔符将多个输入字符串列连接成一个字符串列.
concat_ws(sep: String, exprs: Column*): Column Concatenates multiple input string columns together into a single string column, using the given separator.
val solution = words.withColumn("codes", concat_ws(" ", $"rate_plan_code"))
scala> solution.show
+--------------+-----------+
| words| codes|
+--------------+-----------+
|[hello, world]|hello world|
+--------------+-----------+
地图操作员
使用 map 操作符可以完全控制应该转型什么以及如何转型.
map Operator
Use map operator to have full control of what and how should be transformed.
map[U](func: (T) ⇒ U): Dataset[U] 返回一个新的 Dataset,其中包含将 func 应用于每个元素的结果.
map[U](func: (T) ⇒ U): Dataset[U] Returns a new Dataset that contains the result of applying func to each element.
scala> codes.show(false)
+---+---------------------------+
|id |rate_plan_code |
+---+---------------------------+
|0 |[AAA, RACK, SMOBIX, SMOBPX]|
+---+---------------------------+
val codesAsSingleString = codes.as[(Long, Array[String])]
.map { case (id, codes) => (id, codes.mkString(", ")) }
.toDF("id", "codes")
scala> codesAsSingleString.show(false)
+---+-------------------------+
|id |codes |
+---+-------------------------+
|0 |AAA, RACK, SMOBIX, SMOBPX|
+---+-------------------------+
scala> codesAsSingleString.printSchema
root
|-- id: long (nullable = false)
|-- codes: string (nullable = true)
这篇关于如何将字符串数组的列转换为字符串?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!