如何通过使用列名随机更新特定行的列值 [英] How to update column value for a particular row randomly by using column name

查看:113
本文介绍了如何通过使用列名随机更新特定行的列值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

def getSequence(row : Row) : Seq[String] = {
some code
}

基本上,我想逐行迭代dataFrame,并将从getSequence获得的序列的值更新为1.

Basically I want to iterate the dataFrame by row and update the value with 1 for the sequence I get from getSequence.

输入

+---+----+-----+
|sno|dept|color|
+---+----+-----+
|  1|  0 |  0  |
|  2|  0 |  0  |
|  3|  0 |  0  |
+---+----+-----+

getSequence for Row 1 give Seq("dept")
Row 2 give Seq("color") Row 3 give Seq("dept","color")
output be like 
+---+----+-----+
|sno|dept|color|
+---+----+-----+
|  1|  1 |  0  |
|  2|  0 |  1  |
|  3|  1 |  1  |
+---+----+-----+

推荐答案

def lit(literal: Any): org.apache.spark.sql.Column

def monotonically_increasing_id(): org.apache.spark.sql.Column

使用lit函数更新列值.

请检查以下代码以更新特定列.

Please check below code to update specific column.

scala> val df = Seq((1,0,0),(2,0,0),(3,0,0)).toDF("sno","dept","color").withColumn("id",monotonically_increasing_id)
df: org.apache.spark.sql.DataFrame = [sno: int, dept: int ... 2 more fields]

scala> df.withColumn("dept",when($"id" =!= 1,lit(1)).otherwise(lit(0))).withColumn("color",when($"id" =!= 0,lit(1)).otherwise(lit(0))).drop("id").show(false)
+---+----+-----+
|sno|dept|color|
+---+----+-----+
|1  |1   |0    |
|2  |0   |1    |
|3  |1   |1    |
+---+----+-----+

这篇关于如何通过使用列名随机更新特定行的列值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆