如何使用groupby获得大 pandas 的每月均值 [英] How get monthly mean in pandas using groupby

查看:106
本文介绍了如何使用groupby获得大 pandas 的每月均值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有下一个DataFrame:

I have the next DataFrame:

data=pd.read_csv('anual.csv', parse_dates='Fecha', index_col=0)
data

DatetimeIndex: 290 entries, 2011-01-01 00:00:00 to 2011-12-31 00:00:00
Data columns (total 12 columns):
HR             290  non-null values
PreciAcu       290  non-null values
RadSolar       290  non-null values
T              290  non-null values
Presion        290  non-null values
Tmax           290  non-null values
HRmax          290  non-null values
Presionmax     290  non-null values
RadSolarmax    290  non-null values
Tmin           290  non-null values
HRmin          290  non-null values
Presionmin     290  non-null values
dtypes: float64(4), int64(8)

其中:

data['HR']

Fecha
2011-01-01    37
2011-02-01    70
2011-03-01    62
2011-04-01    69
2011-05-01    72
2011-06-01    71
2011-07-01    71
2011-08-01    70
2011-09-01    40
...
2011-12-17    92
2011-12-18    78
2011-12-19    79
2011-12-20    76
2011-12-21    78
2011-12-22    80
2011-12-23    72
2011-12-24    70

此外,某些月份并不总是完整的.我的目标是根据每日数据计算每个月的平均值.这可以通过以下方式实现:

In addition, some months are not always complete. My goal is to calculate the average of each month from daily data. This is achieved as follows:

monthly=data.resample('M', how='mean')

                HR   PreciAcu    RadSolar         T        Presion     Tmax  
Fecha                                                                         
2011-01-31  68.586207   3.744828  163.379310  17.496552        0  25.875862   
2011-02-28  68.666667   1.966667  208.000000  18.854167        0  28.879167   
2011-03-31  69.136364   3.495455  218.090909  20.986364        0  30.359091   
2011-04-30  68.956522   1.913043  221.130435  22.165217        0  31.708696   
2011-05-31  72.700000   0.550000  201.100000  18.900000        0  27.460000   
2011-06-30  70.821429   6.050000  214.000000  23.032143        0  30.621429   
2011-07-31  78.034483   5.810345  188.206897  21.503448        0  27.951724   
2011-08-31  71.750000   1.028571  214.750000  22.439286        0  30.657143   
2011-09-30  72.481481   0.185185  196.962963  21.714815        0  29.596296    
2011-10-31  68.083333   1.770833  224.958333  18.683333        0  27.075000   
2011-11-30  71.750000   0.812500  169.625000  18.925000        0  26.237500   
2011-12-31  71.833333   0.160000  159.533333  17.260000        0  25.403333 

我发现的第一个误差是在降水量列中,因为在一月份的所有观测值均为0,并且该特定月份的平均值为3.74.

The first error I find is in the column of precipitation, since all observations are 0 in January and an average of 3.74 is obtained for this particular month.

当Excel中的平均值与上面的结果进行比较时,会有很大的差异.例如,Febrero的HR平均值是

When averages in Excel and compare them with the results above, there is significant variation. For Example, the mean of HR for Febrero is

   mean HR using pandas=68.66

   mean HR using excel=67 

我发现的另一个细节:

   data['PreciAcu']['2011-01'].count()

   29 and should be 31

我做错什么了吗? 我该如何解决该错误?

Am I doing something wrong? How I can fix this error?

附件csv文件:

[link] https://www.dropbox.com/s/p5hl137bqm82j41/anual .csv

推荐答案

您的日期列被误解,因为它采用DD/MM/YYYY格式.改为设置dayfirst=True:

Your date column is being misinterpreted, because it's in DD/MM/YYYY format. Set dayfirst=True instead:

>>> df = pd.read_csv('anual.csv', parse_dates='Fecha', dayfirst=True, index_col=0, sep="\s+")
>>> df['PreciAcu']['2011-01'].count()
31
>>> df.resample("M").mean()
                   HR   PreciAcu    RadSolar          T  Presion       Tmax  \
Fecha                                                                         
2011-01-31  68.774194   0.000000  162.354839  16.535484        0  25.393548   
2011-02-28  67.000000   0.000000  193.481481  15.418519        0  25.696296   
2011-03-31  59.083333   0.850000  254.541667  21.295833        0  32.325000   
2011-04-30  61.200000   1.312000  260.640000  24.676000        0  34.760000   
2011-05-31        NaN        NaN         NaN        NaN      NaN        NaN   
2011-06-30  68.428571   8.576190  236.619048  25.009524        0  32.028571   
2011-07-31  81.518519  11.488889  185.407407  22.429630        0  27.681481   
2011-08-31  76.451613   0.677419  219.645161  23.677419        0  30.719355   
2011-09-30  77.533333   2.883333  196.100000  21.573333        0  28.723333   
2011-10-31  73.120000   1.260000  196.280000  19.552000        0  27.636000   
2011-11-30  71.277778 -79.333333  148.555556  18.250000        0  26.511111   
2011-12-31  73.741935   0.067742  134.677419  15.687097        0  24.019355   

                HRmax  Presionmax       Tmin  
Fecha                                         
2011-01-31  92.709677           0  10.909677  
2011-02-28  92.111111           0   8.325926  
2011-03-31  89.291667           0  13.037500  
2011-04-30  89.400000           0  17.328000  
2011-05-31        NaN         NaN        NaN  
2011-06-30  92.095238           0  19.761905  
2011-07-31  97.185185           0  18.774074  
2011-08-31  96.903226           0  18.670968  
2011-09-30  97.200000           0  16.373333  
2011-10-31  97.000000           0  13.412000  
2011-11-30  94.555556           0  11.877778  
2011-12-31  94.161290           0  10.070968  

[12 rows x 9 columns]


(不过,请注意-我忘了这一点-dayfirst=True并不严格,请参见


(Note, though - I'd forgotten this -- that dayfirst=True isn't strict, see here. Maybe using date_parser would be safer.)

这篇关于如何使用groupby获得大 pandas 的每月均值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆