Spark中Dataframe的行操作 [英] Row manipulation for Dataframe in spark
本文介绍了Spark中Dataframe的行操作的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我在spark中有一个数据框,就像:
I have a dataframe in spark which is like :
column_A | column_B
--------- --------
1 1,12,21
2 6,9
column_A
和column_B
均为字符串类型.
如何将上述数据框转换为新的数据框,如:
how can I convert the above dataframe to a new dataframe which is like :
colum_new_A | column_new_B
----------- ------------
1 1
1 12
1 21
2 6
2 9
column_new_A
和column_new_B
都应为String类型.
both column_new_A
and column_new_B
should be of String type.
推荐答案
您需要使用comma
split
Column_B
并将explode
函数用作
You need to split
the Column_B
with comma
and use the explode
function as
val df = Seq(
("1", "1,12,21"),
("2", "6,9")
).toDF("column_A", "column_B")
您可以使用withColumn
或select
创建新的column
.
You can use withColumn
or select
to create new column
.
df.withColumn("column_B", explode(split( $"column_B", ","))).show(false)
df.select($"column_A".as("column_new_A"), explode(split( $"column_B", ",")).as("column_new_B"))
输出:
+------------+------------+
|column_new_A|column_new_B|
+------------+------------+
|1 |1 |
|1 |12 |
|1 |21 |
|2 |6 |
|2 |9 |
+------------+------------+
这篇关于Spark中Dataframe的行操作的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文