复制当前行,对其进行修改并在spark中添加新行 [英] copy current row , modify it and add a new row in spark
问题描述
我正在使用带有java8版本的spark-sql-2.4.1v. 我有一种情况,我需要复制当前行并创建另一行来修改几列数据,该如何在spark-sql中实现?
I am using spark-sql-2.4.1v with java8 version. I have a scenario where I need to copy current row and create another row modifying few columns data how can this be achieved in spark-sql ?
例如: 给定
val data = List(
("20", "score", "school", 14 ,12),
("21", "score", "school", 13 , 13),
("22", "rate", "school", 11 ,14)
)
val df = data.toDF("id", "code", "entity", "value1","value2")
当前输出
+---+-----+------+------+------+
| id| code|entity|value1|value2|
+---+-----+------+------+------+
| 20|score|school| 14| 12|
| 21|score|school| 13| 13|
| 22| rate|school| 11| 14|
+---+-----+------+------+------+
当代码"列是定级"的将其复制为两行,即一是 原始,其次是具有新代码"old_rate"的另一行.喜欢 下方
When column "code" is "rate" copy it as two rows i.e. one is original , second it is another row with new code "old_ rate" like below
预期输出:
+---+--------+------+------+------+
| id| code|entity|value1|value2|
+---+--------+------+------+------+
| 20| score|school| 14| 12|
| 21| score|school| 13| 13|
| 22| rate|school| 11| 14|
| 22|new_rate|school| 11| 14|
+---+--------+------+------+------+
如何实现这一目标?
推荐答案
使用when
检查code === rate
,如果匹配,则将该列值替换为array(lit("rate"),lit("new_rate"))
&不匹配的列值array($"code")
,然后爆炸code
列.
Use when
to check code === rate
, if it is matched then replace that column value with array(lit("rate"),lit("new_rate"))
& not matched column values array($"code")
then explode code
column.
检查以下代码.
scala> df.show(false)
+---+-----+------+------+------+
|id |code |entity|value1|value2|
+---+-----+------+------+------+
|20 |score|school|14 |12 |
|21 |score|school|13 |13 |
|22 |rate |school|11 |14 |
+---+-----+------+------+------+
val colExpr = explode(
when(
$"code" === "rate",
array(
lit("rate"),
lit("new_rate")
)
)
.otherwise(array($"code"))
)
scala> df.withColumn("code",colExpr).show(false)
+---+--------+------+------+------+
|id |code |entity|value1|value2|
+---+--------+------+------+------+
|20 |score |school|14 |12 |
|21 |score |school|13 |13 |
|22 |rate |school|11 |14 |
|22 |new_rate|school|11 |14 |
+---+--------+------+------+------+
这篇关于复制当前行,对其进行修改并在spark中添加新行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!