spark中Dataframe的行操作 [英] Row manipulation for Dataframe in spark
本文介绍了spark中Dataframe的行操作的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我在火花中有一个数据框,就像:
I have a dataframe in spark which is like :
column_A | column_B
--------- --------
1 1,12,21
2 6,9
column_A
和 column_B
都是 String 类型.
both column_A
and column_B
is of String type.
如何将上述数据帧转换为新的数据帧,如下所示:
how can I convert the above dataframe to a new dataframe which is like :
colum_new_A | column_new_B
----------- ------------
1 1
1 12
1 21
2 6
2 9
column_new_A
和 column_new_B
都应该是 String 类型.
both column_new_A
and column_new_B
should be of String type.
推荐答案
你需要用 逗号
split
Column_B
并使用 explode
函数为
You need to split
the Column_B
with comma
and use the explode
function as
val df = Seq(
("1", "1,12,21"),
("2", "6,9")
).toDF("column_A", "column_B")
您可以使用 withColumn
或 select
来创建新的 column
.
You can use withColumn
or select
to create new column
.
df.withColumn("column_B", explode(split( $"column_B", ","))).show(false)
df.select($"column_A".as("column_new_A"), explode(split( $"column_B", ",")).as("column_new_B"))
输出:
+------------+------------+
|column_new_A|column_new_B|
+------------+------------+
|1 |1 |
|1 |12 |
|1 |21 |
|2 |6 |
|2 |9 |
+------------+------------+
这篇关于spark中Dataframe的行操作的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文