如何根据条件(组中的值)更新列? [英] How to update column based on a condition (a value in a group)?
问题描述
我有以下df:
+---+----+-----+
|sno|dept|color|
+---+----+-----+
| 1| fn| red|
| 2| fn| blue|
| 3| fn|green|
+---+----+-----+
如果任何颜色列的值是red
,那么我所有颜色列的值都应更新为red
,如下所示:
If any of the color column values is red
, then I all values of the color column should be updated to be red
, as below:
+---+----+-----+
|sno|dept|color|
+---+----+-----+
| 1| fn| red|
| 2| fn| red|
| 3| fn| red|
+---+----+-----+
我不知道.请帮忙;我尝试了以下代码:
I could not figure it out. Please help; I have tried following code:
val gp=jdbcDF.filter($"dept".contains("fn"))
//.withColumn("newone",when($"dept"==="fn","RED").otherwise("NULL"))
gp.show()
gp.map(
row=>{
val row1=row.getAs[String](1)
var row2=row.getAs[String](2)
val make=if(row1 =="fn") row2="red"
Row(row(0),row(1),make)
}
).collect().foreach(println)
推荐答案
给出:
val df = Seq(
(1, "fn", "red"),
(2, "fn", "blue"),
(3, "fn", "green"),
(4, "aa", "blue"),
(5, "aa", "green"),
(6, "bb", "red"),
(7, "bb", "red"),
(8, "aa", "blue")
).toDF("id", "fn", "color")
进行计算:
val redOrNot = df.groupBy("fn")
.agg(collect_set('color) as "values")
.withColumn("hasRed", array_contains('values, "red"))
// gives null for no option
val colorPicker = when('hasRed, "red")
val result = df.join(redOrNot, "fn")
.withColumn("resultColor", colorPicker)
.withColumn("color", coalesce('resultColor, 'color)) // skips nulls that leads to the answer
.select('id, 'fn, 'color)
result
看起来如下(这似乎是一个答案):
The result
looks as follows (that seems to be an answer):
scala> result.show
+---+---+-----+
| id| fn|color|
+---+---+-----+
| 1| fn| red|
| 2| fn| red|
| 3| fn| red|
| 4| aa| blue|
| 5| aa|green|
| 6| bb| red|
| 7| bb| red|
| 8| aa| blue|
+---+---+-----+
您可以链接when
运算符,并为otherwise
设置默认值.请查阅
You can chain when
operators and have a default value with otherwise
. Consult the scaladoc of when
operator.
我认为您可以使用窗口运算符或用户定义的聚合函数(UDAF)来执行非常相似的操作(也许更有效),但是...嗯...目前尚不知道该怎么做.在这里留下评论以启发他人;-)
I think you could do something very similar (and perhaps more efficient) using windowed operators or user-defined aggregate functions (UDAF), but...well...don't currently know how to do it. Leaving the comment here to inspire others ;-)
p.s.学到了很多!谢谢你的主意!
p.s. Learnt a lot! Thanks for the idea!
这篇关于如何根据条件(组中的值)更新列?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!