将String Array列转换为Spark Scala中的多个列 [英] Convert Array of String column to multiple columns in spark scala
问题描述
我有一个具有以下架构的数据框:
I have a dataframe with following schema:
id : int,
emp_details: Array(String)
一些示例数据:
1, Array(empname=xxx,city=yyy,zip=12345)
2, Array(empname=bbb,city=bbb,zip=22345)
此数据存在于数据帧中,我需要从数组中读取emp_details
并将其分配给新列,如下所示,或者是否可以将该数组split
分配给列名称为empname
的多列,city
和zip
:
This data is there in a dataframe and I need to read emp_details
from the array and assign it to new columns as below or if I can split
this array to multiple columns with column names as empname
,city
and zip
:
.withColumn("empname", xxx)
.withColumn("city", yyy)
.withColumn("zip", 12345)
请您指导我们如何使用Spark(1.6)Scala实现这一目标.
Could you please guide how we can achieve this by using Spark (1.6) Scala.
非常感谢您的帮助...
Really appreciate your help...
非常感谢
推荐答案
您可以使用withColumn
和split
来获取所需的数据
You can use withColumn
and split
to get the required data
df1.withColumn("empname", split($"emp_details" (0), "=")(1))
.withColumn("city", split($"emp_details" (1), "=")(1))
.withColumn("zip", split($"emp_details" (2), "=")(1))
输出:
+---+----------------------------------+-------+----+-----+
|id |emp_details |empname|city|zip |
+---+----------------------------------+-------+----+-----+
|1 |[empname=xxx, city=yyy, zip=12345]|xxx |yyy |12345|
|2 |[empname=bbb, city=bbb, zip=22345]|bbb |bbb |22345|
+---+----------------------------------+-------+----+-----+
更新:
如果array
中没有固定的数据顺序,则可以使用UDF
转换为map
并将其用作
UPDATE:
If you don't have fixed sequence of data in array
then you can use UDF
to convert to map
and use it as
val getColumnsUDF = udf((details: Seq[String]) => {
val detailsMap = details.map(_.split("=")).map(x => (x(0), x(1))).toMap
(detailsMap("empname"), detailsMap("city"),detailsMap("zip"))
})
现在使用udf
df1.withColumn("emp",getColumnsUDF($"emp_details"))
.select($"id", $"emp._1".as("empname"), $"emp._2".as("city"), $"emp._3".as("zip"))
.show(false)
输出:
+---+-------+----+---+
|id |empname|city|zip|
+---+-------+----+---+
|1 |xxx |xxx |xxx|
|2 |bbb |bbb |bbb|
+---+-------+----+---+
希望这会有所帮助!
这篇关于将String Array列转换为Spark Scala中的多个列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!