pyspark如何根据另一列的值返回一列的平均值? [英] pyspark how to return the average of a column based on the value of another column?
本文介绍了pyspark如何根据另一列的值返回一列的平均值?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我不希望这很困难,但是我在理解如何获取Spark数据帧中列的平均值方面遇到困难.
I wouldn't expect this to be difficult, but I'm having trouble understanding how to take the average of a column in my spark dataframe.
数据框如下:
+-------+------------+--------+------------------+
|Private|Applications|Accepted| Rate|
+-------+------------+--------+------------------+
| Yes| 417| 349|0.8369304556354916|
| Yes| 1899| 1720|0.9057398630858347|
| Yes| 1732| 1425|0.8227482678983834|
| Yes| 494| 313|0.6336032388663968|
| No| 3540| 2001|0.5652542372881356|
| No| 7313| 4664|0.6377683577191303|
| Yes| 619| 516|0.8336025848142165|
| Yes| 662| 513|0.7749244712990937|
| Yes| 761| 725|0.9526938239159002|
| Yes| 1690| 1366| 0.808284023668639|
| Yes| 6075| 5349|0.8804938271604938|
| Yes| 632| 494|0.7816455696202531|
| No| 1208| 877|0.7259933774834437|
| Yes| 20192| 13007|0.6441660063391442|
| Yes| 1436| 1228|0.8551532033426184|
| Yes| 392| 351|0.8954081632653061|
| Yes| 12586| 3239|0.2573494358811378|
| Yes| 1011| 604|0.5974282888229476|
| Yes| 848| 587|0.6922169811320755|
| Yes| 8728| 5201|0.5958982584784601|
+-------+------------+--------+------------------+
当Private
等于是"时,我想返回Rate
列的平均值.我该怎么办?
I want to return the average of the Rate
column when Private
is equal to "Yes". How can I do this?
推荐答案
尝试
df.filter(df['Private'] == 'Yes').agg({'Rate': 'avg'}).collect()[0]
这篇关于pyspark如何根据另一列的值返回一列的平均值?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文