Pyspark:将平均值作为新列添加到DataFrame [英] Pyspark: Add the average as a new column to DataFrame

查看：98 发布时间：2021/4/8 20:11:24 python sql apache-spark pyspark

本文介绍了Pyspark:将平均值作为新列添加到DataFrame的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在计算数据帧中一列的平均值，但是它导致所有值均为零.有人可以帮助我为什么会这样吗?以下是列转换前后的代码和表.

I am computing mean of a column in data-frame but it resulted in all the values zeros. Can someone help me in why this is happening? Following is the code and table before and after the transformation of a column.

result.select("dis_price_released").show(10)
 +------------------+
 |dis_price_released|
 +------------------+
 |               0.0|
 |               4.0|
 |               4.0|
 |               4.0|
 |               1.0|
 |               4.0|
 |               4.0|
 |               0.0|
 |               4.0|
 |               0.0|
 +------------------+

计算均值并添加均值列后

w = Window().partitionBy("dis_price_released").rowsBetween(-sys.maxsize, sys.maxsize)
df2 = result.withColumn("mean", avg("dis_price_released").over(w))
df2.select("dis_price_released", "mean").show(10)

+------------------+----+
|dis_price_released|mean|
+------------------+----+
|               0.0| 0.0|
|               0.0| 0.0|
|               0.0| 0.0|
|               0.0| 0.0|
|               0.0| 0.0|
|               0.0| 0.0|
|               0.0| 0.0|
|               0.0| 0.0|
|               0.0| 0.0|
|               0.0| 0.0|
+------------------+----+

推荐答案

您可以先为整列计算 avg ，然后使用 lit()进行添加作为 DataFrame 的变量，不需要窗口函数:

You can compute the avg first for the whole column, then use lit() to add it as a variable to your DataFrame, there is no need for window functions:

from pyspark.sql.functions import lit

mean = df.groupBy().avg("dis_price_released").take(1)[0][0]
df.withColumn("test", lit(mean)).show()
 +------------------+----+
|dis_price_released|test|
+------------------+----+
|               0.0| 2.5|
|               4.0| 2.5|
|               4.0| 2.5|
|               4.0| 2.5|
|               1.0| 2.5|
|               4.0| 2.5|
|               4.0| 2.5|
|               0.0| 2.5|
|               4.0| 2.5|
|               0.0| 2.5|
+------------------+----+

这篇关于Pyspark:将平均值作为新列添加到DataFrame的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

Pyspark:将平均值作为新列添加到DataFrame [英] Pyspark: Add the average as a new column to DataFrame

问题描述

计算均值并添加均值列后

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

Pyspark:将平均值作为新列添加到DataFrame [英] Pyspark: Add the average as a new column to DataFrame

问题描述

计算均值并添加均值列后

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭