pandas 按n秒分组并应用任意滚动功能 [英] pandas group by n seconds and apply arbitrary rolling function

查看:56
本文介绍了 pandas 按n秒分组并应用任意滚动功能的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有以下格式的加速度计读数的csv数据(不完全是这样,实际数据具有更高的采样率):

I have some csv data of accelerometer readings in the following format (not exactly this, the real data has a higher sampling rate):


2013-09-28 17:36:50.322120,  0.152695, -0.545074, -0.852997
2013-09-28 17:36:50.622988,  0.141800, -0.554947, -0.867935
2013-09-28 17:36:51.923802,  0.132431, -0.547089, -0.879333
2013-09-28 17:36:52.124641,  0.124329, -0.530243, -0.887741
2013-09-28 17:36:52.425341,  0.122269, -0.519669, -0.900269
2013-09-28 17:36:52.926202,  0.122879, -0.502151, -0.902023
....
....
....
....
2013-09-28 17:49:14.440343,  0.005447, -0.623016, -0.773529
2013-09-28 17:49:14.557806,  0.009048, -0.623093, -0.790909
2013-09-28 17:49:14.758442,  0.007217, -0.617386, -0.815796

我使用熊猫加载它们

import pandas as pd
accDF=pd.read_csv(accFileName,header=0, sep=',') 
accDF.columns=['time','x','y','z']
accDF=accDF.set_index(['time'])

加速度计数据不是统一采样的,我想分组每10或20或30秒记录一次数据,然后将自定义函数应用于数据组。

The accelerometer data is not uniformly sampled, and I want to group data by every 10 or 20 or 30 seconds and apply a custom function to the data group.

如果对数据进行均匀采样,则很容易进行滚动功能。
但是,由于不是,我想使用时间戳记间隔应用groupby。
以一秒钟的间隔进行操作很容易:

If the data was uniformly sampled, it would have been easy to apply a rolling function. However, since it is not, I want to apply groupby using timestamp interval. Doing so with an interval of one second is easy:

accDF_win=accDF.groupby(accDF.index.second).apply... etc

但是,我无法弄清楚如何根据

However, I cannot figure out how to group by an arbitary number of seconds and then apply a function to it.

使用TimeGrouper,我可以执行以下操作:

With TimeGrouper, I can do the following:

accDF_win=accDF.groupby(pd.TimeGrouper(freq='3Min'))

任意分钟,但好像TimeGrouper没有第二分辨率。

for an arbitrary number of minutes, but seems like TimeGrouper doesn't have 'second' resolution.

在此先感谢您的帮助

推荐答案

首先,您必须将datetime列转换为python-datetime对象(以防万一)。

First of all, you have to convert the datetime-column to a python-datetime object (in case you did'nt).

>>> import pandas as pd
>>> from dateutil import parser
>>> df=pd.read_csv("test.csv",header=None,date_parser=True)
#convert to datetime index, f.e. with dateutil
>>> df=df.set_index(df[0].map(parser.parse)

然后使用 pd.TimeGrouper 像这样:

>>> df[3].groupby(pd.TimeGrouper('10S')).head()
2013-09-28 17:36:40  2013-09-28 17:36:40.322120   -0.852997
                     2013-09-28 17:36:41.622988   -0.867935
                     2013-09-28 17:36:42.923802   -0.879333
                     2013-09-28 17:36:43.124641   -0.887741
                     2013-09-28 17:36:45.425341   -0.900269
2013-09-28 17:36:50  2013-09-28 17:36:52.926202   -0.902023
                     2013-09-28 17:36:53.322120   -0.852997
                     2013-09-28 17:36:53.622988   -0.867935
                     2013-09-28 17:36:54.923802   -0.879333
                     2013-09-28 17:36:54.124641   -0.887741
2013-09-28 17:49:50  2013-09-28 17:49:56.440343   -0.773529
                     2013-09-28 17:49:56.557806   -0.790909
                     2013-09-28 17:49:57.758442   -0.815796

或者看看重采样功能此处。也许您可以应用自定义的重采样功能,而不是使用groupby方法。

Or have a look at the resampling-functions here. Maybe you could apply a custom resampling-function instead of using the groupby-method.

df[3].resample("10S",how=lambda x: Whateveryouwanttodo)

没有任何功能,它将用NaN填充:

Without any function, it fills up with NaN:

>>> df[3].resample("10S")
0
2013-09-28 17:36:40   -0.877655
2013-09-28 17:36:50   -0.884617
2013-09-28 17:37:00         NaN
2013-09-28 17:37:10         NaN
2013-09-28 17:37:20         NaN
2013-09-28 17:37:30         NaN
2013-09-28 17:37:40         NaN

这篇关于 pandas 按n秒分组并应用任意滚动功能的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆