如何根据条件(组中的值)更新列? [英] How to update column based on a condition (a value in a group)?
问题描述
我有以下 df:
+---+----+-----+
|sno|dept|color|
+---+----+-----+
| 1| fn| red|
| 2| fn| blue|
| 3| fn|green|
+---+----+-----+
如果任何颜色列的值是red
,那么颜色列的所有值都应该更新为red
,如下:
If any of the color column values is red
, then I all values of the color column should be updated to be red
, as below:
+---+----+-----+
|sno|dept|color|
+---+----+-----+
| 1| fn| red|
| 2| fn| red|
| 3| fn| red|
+---+----+-----+
我无法弄清楚.请帮忙;我尝试了以下代码:
I could not figure it out. Please help; I have tried following code:
val gp=jdbcDF.filter($"dept".contains("fn"))
//.withColumn("newone",when($"dept"==="fn","RED").otherwise("NULL"))
gp.show()
gp.map(
row=>{
val row1=row.getAs[String](1)
var row2=row.getAs[String](2)
val make=if(row1 =="fn") row2="red"
Row(row(0),row(1),make)
}
).collect().foreach(println)
推荐答案
鉴于:
val df = Seq(
(1, "fn", "red"),
(2, "fn", "blue"),
(3, "fn", "green"),
(4, "aa", "blue"),
(5, "aa", "green"),
(6, "bb", "red"),
(7, "bb", "red"),
(8, "aa", "blue")
).toDF("id", "fn", "color")
进行计算:
val redOrNot = df.groupBy("fn")
.agg(collect_set('color) as "values")
.withColumn("hasRed", array_contains('values, "red"))
// gives null for no option
val colorPicker = when('hasRed, "red")
val result = df.join(redOrNot, "fn")
.withColumn("resultColor", colorPicker)
.withColumn("color", coalesce('resultColor, 'color)) // skips nulls that leads to the answer
.select('id, 'fn, 'color)
result
看起来如下(这似乎是一个答案):
The result
looks as follows (that seems to be an answer):
scala> result.show
+---+---+-----+
| id| fn|color|
+---+---+-----+
| 1| fn| red|
| 2| fn| red|
| 3| fn| red|
| 4| aa| blue|
| 5| aa|green|
| 6| bb| red|
| 7| bb| red|
| 8| aa| blue|
+---+---+-----+
您可以链接 when
运算符,并使用 otherwise
设置默认值.查阅 scaladoc of when
operator.
You can chain when
operators and have a default value with otherwise
. Consult the scaladoc of when
operator.
我认为您可以使用窗口运算符或用户定义的聚合函数 (UDAF) 来做一些非常相似的事情(也许更有效),但是...好吧...目前不知道如何去做.在这里留下评论以激励他人;-)
I think you could do something very similar (and perhaps more efficient) using windowed operators or user-defined aggregate functions (UDAF), but...well...don't currently know how to do it. Leaving the comment here to inspire others ;-)
附言学到了很多!谢谢你的主意!
p.s. Learnt a lot! Thanks for the idea!
这篇关于如何根据条件(组中的值)更新列?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!