Spark Scala:如何根据 Vector 的每个位置更新 DataFrame 的每一列 [英] Spark Scala: How to update each column of a DataFrame in correspondence with each position of a Vector

查看:29
本文介绍了Spark Scala:如何根据 Vector 的每个位置更新 DataFrame 的每一列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个这样的 DF:

I have a DF like this:

+--------------------+-----+--------------------+
|               col_0|col_1|               col_2|
+--------------------+-----+--------------------+
|0.009069428120139292|  0.3|9.015488712438252E-6|
|0.008070826019024355|  0.4|3.379696051366339...|
|0.009774715414895803|  0.1|1.299590589291292...|
|0.009631155146285946|  0.9|1.218569739510422...|

和两个向量:

v1[7.0,0.007,0.052]
v2[804.0,553.0,143993.0]

总列数与每个向量中的总位置数相同.如何使用保存在第 i 个位置的数字应用方程式来进行一些计算以更新 DF(在第 i 个位置)的当前值?我的意思是,我需要使用向量中的值更新 DF 中的所有值.

The total number of columns is the same as the total number of position in each vector. How can apply an equation using the numbers saved in the ith position to make some computation to update the current value of the DF (in the ith position)? I mean, I need to update all values in the DF, using the values in the vectors.

推荐答案

也许这就是您所追求的?

Perhaps something like this is what you're after?

import org.apache.spark.sql.Column
import org.apache.spark.sql.DataFrame

val df = Seq((1,2,3),(4,5,6)).toDF

val updateVector = Vector(10,20,30)

val updateFunction = (columnValue: Column, vectorValue: Int) => columnValue * lit(vectorValue)

val updateColumns = (df: DataFrame, updateVector: Vector[Int], updateFunction:((Column, Int) => Column)) => {
    val columns = df.columns
    updateVector.zipWithIndex.map{case (updateValue, index) => updateFunction(col(columns(index)), updateVector(index)).as(columns(index))}
}

val dfUpdated = df.select(updateColumns(df, updateVector, updateFunction) :_*)

dfUpdated.show

+---+---+---+
| _1| _2| _3|
+---+---+---+
| 10| 40| 90|
| 40|100|180|
+---+---+---+

这篇关于Spark Scala:如何根据 Vector 的每个位置更新 DataFrame 的每一列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆