Spark Scala:如何根据Vector的每个位置更新DataFrame的每一列 [英] Spark Scala: How to update each column of a DataFrame in correspondence with each position of a Vector

查看:170
本文介绍了Spark Scala:如何根据Vector的每个位置更新DataFrame的每一列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个这样的DF:

+--------------------+-----+--------------------+
|               col_0|col_1|               col_2|
+--------------------+-----+--------------------+
|0.009069428120139292|  0.3|9.015488712438252E-6|
|0.008070826019024355|  0.4|3.379696051366339...|
|0.009774715414895803|  0.1|1.299590589291292...|
|0.009631155146285946|  0.9|1.218569739510422...|

和两个向量:

v1[7.0,0.007,0.052]
v2[804.0,553.0,143993.0]

列的总数与每个向量中的位置总数相同. 如何使用保存在第i个位置的数字来应用方程式以进行一些计算以更新DF的当前值(在第i个位置)?我的意思是,我需要使用向量中的值更新DF中的所有值.

The total number of columns is the same as the total number of position in each vector. How can apply an equation using the numbers saved in the ith position to make some computation to update the current value of the DF (in the ith position)? I mean, I need to update all values in the DF, using the values in the vectors.

推荐答案

也许您正在追求这样的东西?

Perhaps something like this is what you're after?

import org.apache.spark.sql.Column
import org.apache.spark.sql.DataFrame

val df = Seq((1,2,3),(4,5,6)).toDF

val updateVector = Vector(10,20,30)

val updateFunction = (columnValue: Column, vectorValue: Int) => columnValue * lit(vectorValue)

val updateColumns = (df: DataFrame, updateVector: Vector[Int], updateFunction:((Column, Int) => Column)) => {
    val columns = df.columns
    updateVector.zipWithIndex.map{case (updateValue, index) => updateFunction(col(columns(index)), updateVector(index)).as(columns(index))}
}

val dfUpdated = df.select(updateColumns(df, updateVector, updateFunction) :_*)

dfUpdated.show

+---+---+---+
| _1| _2| _3|
+---+---+---+
| 10| 40| 90|
| 40|100|180|
+---+---+---+

这篇关于Spark Scala:如何根据Vector的每个位置更新DataFrame的每一列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆