如何根据条件(组中的值)更新列? [英] How to update column based on a condition (a value in a group)?

查看:85
本文介绍了如何根据条件(组中的值)更新列?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有以下df:

+---+----+-----+
|sno|dept|color|
+---+----+-----+
|  1|  fn|  red|
|  2|  fn| blue|
|  3|  fn|green|
+---+----+-----+

如果任何颜色列的值是red,那么我所有颜色列的值都应更新为red,如下所示:

If any of the color column values is red, then I all values of the color column should be updated to be red, as below:

+---+----+-----+
|sno|dept|color|
+---+----+-----+
|  1|  fn|  red|
|  2|  fn|  red|
|  3|  fn|  red|
+---+----+-----+

我不知道.请帮忙;我尝试了以下代码:

I could not figure it out. Please help; I have tried following code:

val gp=jdbcDF.filter($"dept".contains("fn"))
     //.withColumn("newone",when($"dept"==="fn","RED").otherwise("NULL"))
    gp.show()
gp.map(
  row=>{
    val row1=row.getAs[String](1)
    var row2=row.getAs[String](2)
    val make=if(row1 =="fn") row2="red"
    Row(row(0),row(1),make)
  }
).collect().foreach(println)

推荐答案

给出:

val df = Seq(
  (1, "fn", "red"),
  (2, "fn", "blue"),
  (3, "fn", "green"),
  (4, "aa", "blue"),
  (5, "aa", "green"),
  (6, "bb", "red"),
  (7, "bb", "red"),
  (8, "aa", "blue")
).toDF("id", "fn", "color")

进行计算:

val redOrNot = df.groupBy("fn")
  .agg(collect_set('color) as "values")
  .withColumn("hasRed", array_contains('values, "red"))

// gives null for no option
val colorPicker = when('hasRed, "red")
val result = df.join(redOrNot, "fn")
  .withColumn("resultColor", colorPicker) 
  .withColumn("color", coalesce('resultColor, 'color)) // skips nulls that leads to the answer
  .select('id, 'fn, 'color)

result看起来如下(这似乎是一个答案):

The result looks as follows (that seems to be an answer):

scala> result.show
+---+---+-----+
| id| fn|color|
+---+---+-----+
|  1| fn|  red|
|  2| fn|  red|
|  3| fn|  red|
|  4| aa| blue|
|  5| aa|green|
|  6| bb|  red|
|  7| bb|  red|
|  8| aa| blue|
+---+---+-----+

您可以链接when运算符,并为otherwise设置默认值.请查阅

You can chain when operators and have a default value with otherwise. Consult the scaladoc of when operator.

我认为您可以使用窗口运算符或用户定义的聚合函数(UDAF)来执行非常相似的操作(也许更有效),但是...嗯...目前尚不知道该怎么做.在这里留下评论以启发他人;-)

I think you could do something very similar (and perhaps more efficient) using windowed operators or user-defined aggregate functions (UDAF), but...well...don't currently know how to do it. Leaving the comment here to inspire others ;-)

p.s.学到了很多!谢谢你的主意!

p.s. Learnt a lot! Thanks for the idea!

这篇关于如何根据条件(组中的值)更新列?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆