如何使用groupby获得大 pandas 的每月均值 [英] How get monthly mean in pandas using groupby
问题描述
我有下一个DataFrame:
I have the next DataFrame:
data=pd.read_csv('anual.csv', parse_dates='Fecha', index_col=0)
data
DatetimeIndex: 290 entries, 2011-01-01 00:00:00 to 2011-12-31 00:00:00
Data columns (total 12 columns):
HR 290 non-null values
PreciAcu 290 non-null values
RadSolar 290 non-null values
T 290 non-null values
Presion 290 non-null values
Tmax 290 non-null values
HRmax 290 non-null values
Presionmax 290 non-null values
RadSolarmax 290 non-null values
Tmin 290 non-null values
HRmin 290 non-null values
Presionmin 290 non-null values
dtypes: float64(4), int64(8)
其中:
data['HR']
Fecha
2011-01-01 37
2011-02-01 70
2011-03-01 62
2011-04-01 69
2011-05-01 72
2011-06-01 71
2011-07-01 71
2011-08-01 70
2011-09-01 40
...
2011-12-17 92
2011-12-18 78
2011-12-19 79
2011-12-20 76
2011-12-21 78
2011-12-22 80
2011-12-23 72
2011-12-24 70
此外,某些月份并不总是完整的.我的目标是根据每日数据计算每个月的平均值.这可以通过以下方式实现:
In addition, some months are not always complete. My goal is to calculate the average of each month from daily data. This is achieved as follows:
monthly=data.resample('M', how='mean')
HR PreciAcu RadSolar T Presion Tmax
Fecha
2011-01-31 68.586207 3.744828 163.379310 17.496552 0 25.875862
2011-02-28 68.666667 1.966667 208.000000 18.854167 0 28.879167
2011-03-31 69.136364 3.495455 218.090909 20.986364 0 30.359091
2011-04-30 68.956522 1.913043 221.130435 22.165217 0 31.708696
2011-05-31 72.700000 0.550000 201.100000 18.900000 0 27.460000
2011-06-30 70.821429 6.050000 214.000000 23.032143 0 30.621429
2011-07-31 78.034483 5.810345 188.206897 21.503448 0 27.951724
2011-08-31 71.750000 1.028571 214.750000 22.439286 0 30.657143
2011-09-30 72.481481 0.185185 196.962963 21.714815 0 29.596296
2011-10-31 68.083333 1.770833 224.958333 18.683333 0 27.075000
2011-11-30 71.750000 0.812500 169.625000 18.925000 0 26.237500
2011-12-31 71.833333 0.160000 159.533333 17.260000 0 25.403333
我发现的第一个误差是在降水量列中,因为在一月份的所有观测值均为0,并且该特定月份的平均值为3.74.
The first error I find is in the column of precipitation, since all observations are 0 in January and an average of 3.74 is obtained for this particular month.
当Excel中的平均值与上面的结果进行比较时,会有很大的差异.例如,Febrero的HR平均值是
When averages in Excel and compare them with the results above, there is significant variation. For Example, the mean of HR for Febrero is
mean HR using pandas=68.66
mean HR using excel=67
我发现的另一个细节:
data['PreciAcu']['2011-01'].count()
29 and should be 31
我做错什么了吗? 我该如何解决该错误?
Am I doing something wrong? How I can fix this error?
附件csv文件:
[link] https://www.dropbox.com/s/p5hl137bqm82j41/anual .csv
推荐答案
您的日期列被误解,因为它采用DD/MM/YYYY格式.改为设置dayfirst=True
:
Your date column is being misinterpreted, because it's in DD/MM/YYYY format. Set dayfirst=True
instead:
>>> df = pd.read_csv('anual.csv', parse_dates='Fecha', dayfirst=True, index_col=0, sep="\s+")
>>> df['PreciAcu']['2011-01'].count()
31
>>> df.resample("M").mean()
HR PreciAcu RadSolar T Presion Tmax \
Fecha
2011-01-31 68.774194 0.000000 162.354839 16.535484 0 25.393548
2011-02-28 67.000000 0.000000 193.481481 15.418519 0 25.696296
2011-03-31 59.083333 0.850000 254.541667 21.295833 0 32.325000
2011-04-30 61.200000 1.312000 260.640000 24.676000 0 34.760000
2011-05-31 NaN NaN NaN NaN NaN NaN
2011-06-30 68.428571 8.576190 236.619048 25.009524 0 32.028571
2011-07-31 81.518519 11.488889 185.407407 22.429630 0 27.681481
2011-08-31 76.451613 0.677419 219.645161 23.677419 0 30.719355
2011-09-30 77.533333 2.883333 196.100000 21.573333 0 28.723333
2011-10-31 73.120000 1.260000 196.280000 19.552000 0 27.636000
2011-11-30 71.277778 -79.333333 148.555556 18.250000 0 26.511111
2011-12-31 73.741935 0.067742 134.677419 15.687097 0 24.019355
HRmax Presionmax Tmin
Fecha
2011-01-31 92.709677 0 10.909677
2011-02-28 92.111111 0 8.325926
2011-03-31 89.291667 0 13.037500
2011-04-30 89.400000 0 17.328000
2011-05-31 NaN NaN NaN
2011-06-30 92.095238 0 19.761905
2011-07-31 97.185185 0 18.774074
2011-08-31 96.903226 0 18.670968
2011-09-30 97.200000 0 16.373333
2011-10-31 97.000000 0 13.412000
2011-11-30 94.555556 0 11.877778
2011-12-31 94.161290 0 10.070968
[12 rows x 9 columns]
(不过,请注意-我忘了这一点-dayfirst=True
并不严格,请参见
(Note, though - I'd forgotten this -- that dayfirst=True
isn't strict, see here. Maybe using date_parser
would be safer.)
这篇关于如何使用groupby获得大 pandas 的每月均值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!