Spark Scala - 如何迭代数据帧中的行，并将计算值添加为数据帧的新列 [英] Spark Scala - How do I iterate rows in dataframe, and add calculated values as new columns of the data frame

查看：32 发布时间：2021/11/14 22:42:03 scala apache-spark apache-spark-sql spark-dataframe

本文介绍了Spark Scala - 如何迭代数据帧中的行，并将计算值添加为数据帧的新列的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一个包含date"和value"两列的数据框，如何向数据框中添加 2 个新列value_mean"和value_sd"，其中value_mean"是过去 10 个value"的平均值天(包括日期"中指定的当天)和value_sd"是过去 10 天值"的标准偏差?

I have a dataframe with two columns "date" and "value", how do I add 2 new columns "value_mean" and "value_sd" to the dataframe where "value_mean" is the average of "value" over the last 10 days (including the current day as specified in "date") and "value_sd" is the standard deviation of the "value" over the last 10 days?

推荐答案

Spark sql 提供各种数据框函数，如avg、mean、sum等

Spark sql provide the various dataframe function like avg,mean,sum etc.

您只需要使用 spark sql 列

import org.apache.spark.sql.types._
import org.apache.spark.sql.functions._
import org.apache.spark.sql.Column

为标准偏差创建私有方法

Create private method for standard deviation

private def stddev(col: Column): Column = sqrt(avg(col * col) - avg(col) * avg(col))

现在您可以为平均值和标准偏差创建 sql 列

Now you can create sql Column for average and standard deviation

val value_sd: org.apache.spark.sql.Column = stddev(df.col("value")).as("value_sd")
val value_mean: org.apache.spark.sql.Column = avg(df.col("value").as("value_mean"))

根据需要过滤过去 10 天的数据框

Filter your dataframe for last 10 days or as you want

val filterDF=df.filter("")//put your filter condition

现在您可以在 filterDF 上应用聚合函数

Now yon can apply the aggregate function on your filterDF

filterDF.agg(stdv, value_mean).show

这篇关于Spark Scala - 如何迭代数据帧中的行，并将计算值添加为数据帧的新列的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

Spark Scala - 如何迭代数据帧中的行，并将计算值添加为数据帧的新列 [英] Spark Scala - How do I iterate rows in dataframe, and add calculated values as new columns of the data frame

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

Spark Scala - 如何迭代数据帧中的行，并将计算值添加为数据帧的新列 [英] Spark Scala - How do I iterate rows in dataframe, and add calculated values as new columns of the data frame

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭