spark中Dataframe的行操作 [英] Row manipulation for Dataframe in spark

查看:62
本文介绍了spark中Dataframe的行操作的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在火花中有一个数据框,就像:

I have a dataframe in spark which is like :

 column_A | column_B
 ---------  --------
  1          1,12,21
  2          6,9

column_Acolumn_B 都是 String 类型.

both column_A and column_B is of String type.

如何将上述数据帧转换为新的数据帧,如下所示:

how can I convert the above dataframe to a new dataframe which is like :

  colum_new_A | column_new_B
  -----------   ------------
     1             1
     1             12
     1             21
     2             6
     2             9

column_new_Acolumn_new_B 都应该是 String 类型.

both column_new_A and column_new_B should be of String type.

推荐答案

你需要用 逗号 split Column_B 并使用 explode 函数为

You need to split the Column_B with comma and use the explode function as

val df = Seq(
  ("1", "1,12,21"),
  ("2", "6,9")
).toDF("column_A", "column_B")

您可以使用 withColumnselect 来创建新的 column.

You can use withColumn or select to create new column.

df.withColumn("column_B", explode(split( $"column_B", ","))).show(false)

df.select($"column_A".as("column_new_A"), explode(split( $"column_B", ",")).as("column_new_B"))

输出:

+------------+------------+
|column_new_A|column_new_B|
+------------+------------+
|1           |1           |
|1           |12          |
|1           |21          |
|2           |6           |
|2           |9           |
+------------+------------+

这篇关于spark中Dataframe的行操作的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆