特定时间段内的平均值 [英] Average over a specific time period

查看:104
本文介绍了特定时间段内的平均值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在.h5文件中有一个很大的python表 该表的开始看起来像这样:

I have a quite huge table in python from a .h5 file The start of the table looks somewhat like this:

table =
                [WIND REL DIRECTION  [deg]]  [WIND SPEED  [kts]]  \
735381.370833                            0             0.000000   
735381.370845                            0             0.000000   
735381.370880                            0             0.000000   
735381.370891                            0             0.000000   
735381.370903                            0             0.000000   
735381.370972                            0             0.000000   
735381.370984                            0             0.000000   
735381.370995                            0             0.000000   
735381.371007                            0             0.000000   
735381.371019                            0             0.000000   
...

索引行是数据的时间戳.我需要每15秒计算一次平均WIND REL SPEED和WIND SPEED,并将其变成一行.我真的需要以有效的方式执行此操作,.h5文件很大.

The index line is the timestamp of the data. I need to take calculate the avarage WIND REL SPEED and WIND SPEED every 15th second, and turn this into one row. I really need to do this in an efficient way, this .h5 file is huge.

以下是一些相关代码:

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from pylab import *
import matplotlib.dates as pltd
import tables

pltd.num2date(table.index) #to turn the timestamp into a date

我在这里一无所知,感谢所有帮助.

I am quite clueless here, all help is appreciated.

推荐答案

resample 是你的朋友.

resample is your friend.

idx = pltd.num2date(table.index)
df = pd.DataFrame({'direction': np.random.randn(10), 
                   'speed': np.random.randn(10)}, 
                  index=idx)

>>> df
                                  direction     speed
2014-05-28 08:53:59.971204+00:00   0.205429  0.699439
2014-05-28 08:54:01.008002+00:00   0.383199 -0.392261
2014-05-28 08:54:04.031995+00:00  -2.146569 -0.325526
2014-05-28 08:54:04.982402+00:00   1.572352  1.289276
2014-05-28 08:54:06.019200+00:00   0.880394 -0.440667
2014-05-28 08:54:11.980795+00:00  -1.343758  0.615725
2014-05-28 08:54:13.017603+00:00  -1.713043  0.552017
2014-05-28 08:54:13.968000+00:00  -0.350017  0.728910
2014-05-28 08:54:15.004798+00:00  -0.619273  0.286762
2014-05-28 08:54:16.041596+00:00   0.459747  0.524788

>>> df.resample('15S', how='mean') # how='mean' is the default here
                           direction     speed
2014-05-28 08:53:45+00:00   0.205429  0.699439
2014-05-28 08:54:00+00:00  -0.388206  0.289639
2014-05-28 08:54:15+00:00  -0.079763  0.405775

性能类似于@LondonRob提供的方法.我使用了具有一百万行的DataFrame进行测试.

Performance is similar to the method provided by @LondonRob. I used a DataFrame with 1 million rows to test.

df = pd.DataFrame({'direction': np.random.randn(1e6), 'speed': np.random.randn(1e6)}, index=pd.date_range(start='2015-1-1', periods=1e6, freq='1S'))

>>> %timeit df.resample('15S')
100 loops, best of 3: 15.6 ms per loop

>>> %timeit df.groupby(pd.TimeGrouper(freq='15S')).mean()
100 loops, best of 3: 15.7 ms per loop

这篇关于特定时间段内的平均值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆