复制当前行,对其进行修改并在spark中添加新行 [英] copy current row , modify it and add a new row in spark

查看:258
本文介绍了复制当前行,对其进行修改并在spark中添加新行的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用带有java8版本的spark-sql-2.4.1v. 我有一种情况,我需要复制当前行并创建另一行来修改几列数据,该如何在spark-sql中实现?

I am using spark-sql-2.4.1v with java8 version. I have a scenario where I need to copy current row and create another row modifying few columns data how can this be achieved in spark-sql ?

例如: 给定

 val data = List(
  ("20", "score", "school",  14 ,12),
  ("21", "score", "school",  13 , 13),
  ("22", "rate", "school",  11 ,14)
 )
val df = data.toDF("id", "code", "entity", "value1","value2")

当前输出

+---+-----+------+------+------+
| id| code|entity|value1|value2|
+---+-----+------+------+------+
| 20|score|school|    14|    12|
| 21|score|school|    13|    13|
| 22| rate|school|    11|    14|
+---+-----+------+------+------+

当代码"列是定级"的将其复制为两行,即一是 原始,其次是具有新代码"old_rate"的另一行.喜欢 下方

When column "code" is "rate" copy it as two rows i.e. one is original , second it is another row with new code "old_ rate" like below

预期输出:

+---+--------+------+------+------+
| id|    code|entity|value1|value2|
+---+--------+------+------+------+
| 20|   score|school|    14|    12|
| 21|   score|school|    13|    13|
| 22|    rate|school|    11|    14|
| 22|new_rate|school|    11|    14|
+---+--------+------+------+------+

如何实现这一目标?

推荐答案

使用when检查code === rate,如果匹配,则将该列值替换为array(lit("rate"),lit("new_rate"))&不匹配的列值array($"code"),然后爆炸code列.

Use when to check code === rate, if it is matched then replace that column value with array(lit("rate"),lit("new_rate")) & not matched column values array($"code") then explode code column.

检查以下代码.

scala> df.show(false)
+---+-----+------+------+------+
|id |code |entity|value1|value2|
+---+-----+------+------+------+
|20 |score|school|14    |12    |
|21 |score|school|13    |13    |
|22 |rate |school|11    |14    |
+---+-----+------+------+------+

val colExpr = explode(
    when(
        $"code" === "rate",
        array(
            lit("rate"),
            lit("new_rate")
        )
    )
    .otherwise(array($"code"))
)

scala> df.withColumn("code",colExpr).show(false)
+---+--------+------+------+------+
|id |code    |entity|value1|value2|
+---+--------+------+------+------+
|20 |score   |school|14    |12    |
|21 |score   |school|13    |13    |
|22 |rate    |school|11    |14    |
|22 |new_rate|school|11    |14    |
+---+--------+------+------+------+

这篇关于复制当前行,对其进行修改并在spark中添加新行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆