Python:以CSV计算每小时的平均值吗? [英] Python: Calculate average for each hour in CSV?

查看:100
本文介绍了Python:以CSV计算每小时的平均值吗?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想使用 CSV 文件来计算每个小时的平均值:

I want to calculate the average for each hours using a CSV file:

下面是我的数据集:

Timestamp    Temperature
9/1/2016 0:00:08    53.8
9/1/2016 0:00:38    53.8
9/1/2016 0:01:08    53.8
9/1/2016 0:01:38    53.8
9/1/2016 0:02:08    53.8
9/1/2016 0:02:38    54.1
9/1/2016 0:03:08    54.1
9/1/2016 0:03:38    54.1
9/1/2016 0:04:38    54
9/1/2016 0:05:38    54
9/1/2016 0:06:08    54
9/1/2016 0:06:38    54
9/1/2016 0:07:08    54
9/1/2016 0:07:38    54
9/1/2016 0:08:08    54.1
9/1/2016 0:08:38    54.1
9/1/2016 0:09:38    54.1
9/1/2016 0:10:32    54
9/1/2016 0:11:02    54
9/1/2016 0:11:32    54
9/1/2016 0:00:08    54
9/2/2016 0:00:20    32
9/2/2016 0:00:50    32
9/2/2016 0:01:20    32
9/2/2016 0:01:50    32
9/2/2016 0:02:20    32
9/2/2016 0:02:50    32
9/2/2016 0:03:20    32
9/2/2016 0:03:50    32
9/2/2016 0:04:20    32
9/2/2016 0:04:50    32
9/2/2016 0:05:20    32
9/2/2016 0:05:50    32
9/2/2016 0:06:20    32
9/2/2016 0:06:50    32
9/2/2016 0:07:20    32
9/2/2016 0:07:50    32

这是我每天的平均计算代码,但我希望每小时:

Here is my code for calculating per day average, but I want per hour:

from datetime import datetime
import pandas
def same_day(date_string): # Remove year
return datetime.strptime(date_string, "%m/%d/%Y %H:%M%S").strftime(%m%d')

df = pandas.read_csv('/home/kk/Desktop/cal_Avg.csv',index_col=0,usecols=[0, 1], names=['Timestamp', 'Discharge'],converters={'Timestamp': same_day})

print(df.groupby(level=0).mean())

我想要的输出是:

Timestamp              Temp          *        Avg
9/1/2016 0:00:08    53.8
9/1/2016 0:00:38    53.8    ?avg for this hour
9/1/2016 0:01:08    53.8
9/1/2016 0:01:38    53.8    ?avg for this hour
9/1/2016 0:02:08    53.8
9/1/2016 0:02:38    54.1

现在我想要特定时间的平均值,最小

Now I want the average for specific hours , Min

所需的输出:

在这里,我只打印日期为2016年9月9日和2016年2月9日的5小时输出

Here I am printing only 5 hours output for date 01-09-2016 and 02-09-16

010900              54.362727         45.497273
010901              54.723276         45.068103
010902              54.746847         45.370270
010903              54.833913         44.931304
010904              54.971053         44.835088
010905              55.519444         44.459259
020901              31.742553         55.640426
020902              31.495556         55.655556
020903              31.304348         55.442609
020904              31.200000         55.437273
020905              31.294382         55.442697

具体日期和具体时间?我该如何存档?

Specific date and there specific hours? How do I archive this?

推荐答案

我认为您首先需要

I think you need first read_csv with parameters index_col=[0] for read first column to index and parse_dates=[0] for parse first column to DatetimeIndex:

df = pd.read_csv('filename', index_col=[0], parse_dates=[0],, usecols=[0,1])
print (df)
                     Temperature
Timestamp                       
2016-09-01 00:00:08         53.8
2016-09-01 00:00:38         53.8
2016-09-01 00:01:08         53.8
2016-09-01 00:01:38         53.8
2016-09-01 00:02:08         53.8
2016-09-01 00:02:38         54.1
2016-09-01 00:03:08         54.1
...
...

然后使用 resample hours 并汇总 Resampler.mean ,但由于 DatetimeIndex 中的数据丢失而得到 NaN :

Then use resample by hours and aggregate Resampler.mean, but get NaN for missing data in DatetimeIndex:

print (df.resample('H').mean())
                     Temperature
Timestamp                       
2016-09-01 00:00:00    53.980952
2016-09-01 01:00:00          NaN
2016-09-01 02:00:00          NaN
2016-09-01 03:00:00          NaN
2016-09-01 04:00:00          NaN
2016-09-01 05:00:00          NaN
2016-09-01 06:00:00          NaN
2016-09-01 07:00:00          NaN
2016-09-01 08:00:00          NaN
2016-09-01 09:00:00          NaN
2016-09-01 10:00:00          NaN
2016-09-01 11:00:00          NaN
2016-09-01 12:00:00          NaN
2016-09-01 13:00:00          NaN
2016-09-01 14:00:00          NaN
2016-09-01 15:00:00          NaN
2016-09-01 16:00:00          NaN
2016-09-01 17:00:00          NaN
2016-09-01 18:00:00          NaN
2016-09-01 19:00:00          NaN
2016-09-01 20:00:00          NaN
2016-09-01 21:00:00          NaN
2016-09-01 22:00:00          NaN
2016-09-01 23:00:00          NaN
2016-09-02 00:00:00    32.000000

另一种解决方案是通过将此数组转换为 hours groupby 来删除分钟 seconds :

Another solution is remove minutes and seconds by casting to hours and groupby by this array:

print (df.index.values.astype('<M8[h]'))
['2016-09-01T00' '2016-09-01T00' '2016-09-01T00' '2016-09-01T00'
 '2016-09-01T00' '2016-09-01T00' '2016-09-01T00' '2016-09-01T00'
 '2016-09-01T00' '2016-09-01T00' '2016-09-01T00' '2016-09-01T00'
 '2016-09-01T00' '2016-09-01T00' '2016-09-01T00' '2016-09-01T00'
 '2016-09-01T00' '2016-09-01T00' '2016-09-01T00' '2016-09-01T00'
 '2016-09-01T00' '2016-09-02T00' '2016-09-02T00' '2016-09-02T00'
 '2016-09-02T00' '2016-09-02T00' '2016-09-02T00' '2016-09-02T00'
 '2016-09-02T00' '2016-09-02T00' '2016-09-02T00' '2016-09-02T00'
 '2016-09-02T00' '2016-09-02T00' '2016-09-02T00' '2016-09-02T00'
 '2016-09-02T00']

print (df.groupby([df.index.values.astype('<M8[h]')]).mean())
            Temperature
2016-09-01    53.980952
2016-09-02    32.000000

如果需要按月,日和小时表示,则可以通过

Also if need meean by months, days and hours is posible groupby by DatetimeIndex.strftime:

print (df.index.strftime('%m%d%H'))
['090100' '090100' '090100' '090100' '090100' '090100' '090100' '090100'
 '090100' '090100' '090100' '090100' '090100' '090100' '090100' '090100'
 '090100' '090100' '090100' '090100' '090100' '090200' '090200' '090200'
 '090200' '090200' '090200' '090200' '090200' '090200' '090200' '090200'
 '090200' '090200' '090200' '090200' '090200']

print (df.groupby([df.index.strftime('%m%d%H')]).mean())
        Temperature
090100    53.980952
090200    32.000000

或者如果需要仅用小时数 groupby ,用 DatetimeIndex.hour :

Or if need mean only by hours groupby by DatetimeIndex.hour:

print (df.index.hour)
[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]

print (df.groupby([df.index.hour]).mean())
   Temperature
0    44.475676

这篇关于Python:以CSV计算每小时的平均值吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆