Pandas DataFrame：A列窗口中B列值的平均值 [英] Pandas DataFrame: mean of column B values within column A windows

查看：99 发布时间：2020/10/17 0:33:25 python pandas dataframe mean binning

本文介绍了Pandas DataFrame：A列窗口中B列值的平均值的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

如果我在Python中有一个pandas DataFrame，如下所示：

If I have a pandas DataFrame in Python such as follows:

import numpy as np
import pandas as pd

a = np.random.uniform(0,10,20)
b = np.random.uniform(0,1,20)
data = np.vstack([a,b]).T

df = pd.DataFrame(data)
df.columns = ['A','B']
df.sort_values(by=['A'])

           A         B
5   0.057519  0.465408
14  1.610972  0.398077
3   1.725556  0.397708
17  1.734124  0.600723
11  1.944105  0.694152
19  3.265799  0.878538
13  3.352460  0.770505
10  3.865299  0.064723
16  4.137863  0.659662
12  5.597172  0.122269
7   5.990105  0.667533
6   6.410582  0.193027
9   6.881429  0.041691
15  7.522877  0.268144
1   8.093155  0.130559
0   8.699004  0.996624
8   8.755095  0.495984
4   9.135271  0.792966
18  9.440045  0.477514
2   9.654226  0.509812

是否可以有效地计算t他在 A 列间隔中的 B 列的平均值是什么？

Is it possible to efficiently calculate the mean of column B values in intervals of column A?

例如，一个人可能想计算列 B 中的值的均值，这些值落入bin范围 [0,1,2,3 ，$ 4,5,6,7,8,9,10] 列 A 。因此，对于bin范围 A = {0-1} ，落入该bin的 B 值的平均值为 0.465408 ，对于bin范围 A = {1-2} ，落入该bin的B值的平均值将为 0.522665 等。

For example one might want to calculate the mean of values in column B which fall into the bin ranges [0,1,2,3,4,5,6,7,8,9,10] of column A. So for the bin range A = {0-1} the mean of B values falling within this bin would be 0.465408, for the bin range A = {1-2} the mean of B values falling within this bin would be 0.522665, etc.

我发现 pandas.core.window.Rolling.mean （请参见 https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.core.window.Rolling.mean.html ），但它似乎是在计算窗口的平均值具有指定的长度，而不是超过另一列的合并宽度。

I've found pandas.core.window.Rolling.mean (see https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.core.window.Rolling.mean.html) but it appears to calculate the mean values over a window of specified length rather than over bin widths of another column.

推荐答案

使用 cut 细分 A 列放入垃圾箱，然后在这些段上应用 groupby 并计算平均值的值 B ：

Using cut to segment A column into bins, and then applying groupby on these segments and calculating the mean value of B:

df.groupby(pd.cut(df['A'], bins=np.arange(11)))['B'].mean()

输出：

A
(0, 1]     0.465408
(1, 2]     0.522665
(2, 3]          NaN
(3, 4]     0.571255
(4, 5]     0.659662
(5, 6]     0.394901
(6, 7]     0.117359
(7, 8]     0.268144
(8, 9]     0.541056
(9, 10]    0.593431

更新：：您可以使用 agg 应用一组不同的聚合函数，例如平均值， std 和大小，例如：

Update: you can use agg to apply a set of different aggregation functions, such as mean, std and size for example:

df.groupby(pd.cut(df['A'], bins=np.arange(11)))['B'].agg(['mean', 'std', 'size'])

输出：

             mean       std  size
A                                
(0, 1]   0.465408       NaN     1
(1, 2]   0.522665  0.149038     4
(2, 3]        NaN       NaN     0
(3, 4]   0.571255  0.441983     3
(4, 5]   0.659662       NaN     1
(5, 6]   0.394901  0.385560     2
(6, 7]   0.117359  0.107011     2
(7, 8]   0.268144       NaN     1
(8, 9]   0.541056  0.434788     3
(9, 10]  0.593431  0.173556     3

这篇关于Pandas DataFrame：A列窗口中B列值的平均值的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

Pandas DataFrame：A列窗口中B列值的平均值 [英] Pandas DataFrame: mean of column B values within column A windows

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

Pandas DataFrame：A列窗口中B列值的平均值 [英] Pandas DataFrame: mean of column B values within column A windows

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭