使用特定时间间隔将大 pandas 时间序列数据帧分组 [英] group pandas time-series data frame using specific time intervals

查看:111
本文介绍了使用特定时间间隔将大 pandas 时间序列数据帧分组的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个很大的csv文件,其中带有iso格式2015-04-01 10:26:41的时间戳数据.数据跨越数月,输入范围从相隔30秒到数小时不等.它的列是id,时间,速度.

I have a large csv file with time stamp data in the iso format 2015-04-01 10:26:41. The data span multiple months with entries ranging from 30 secs apart to multiple hours. It's columns are id, time, speed.

最终,我想按15分钟的时间间隔对数据进行分组,然后计算平均速度,但是在15分钟的时隙中有很多条目.

Ultimately I want to group data by a time interval of 15 mins, then calculate an average speed, for however many entries are in the 15 mins timeslot.

我正在尝试使用Pandas,因为它似乎具有可靠的时间序列工具,并且这样做可能很容易,但是我却遇到了第一个障碍.

I am trying to use Pandas because it seems like it has a solid time-series tools and it might be easy to do this, but I am falling at the first hurdle.

到目前为止,我已经将CSV导入为数据框,并且所有列的dtype为object.我已经按日期对数据进行了排序,现在正尝试按时间间隔对条目进行分组,这正是我在其中努力的地方.基于谷歌搜索,我尝试使用此代码df.resample('5min', how=sum) resample数据.在这里,我得到错误TypeError: Only valid with DatetimeIndex, TimedeltaIndex or PeriodIndex.我正在考虑尝试groupby方法,也许像在df.groupby(lambda x:x.minutes + 5)中那样使用lambda,这会产生错误AttributeError: 'str' object has no attribute 'minutes'.

So far I have imported the CSV as a dataframe and, all columns have a dtype of object. I have sorted the data by date and am now trying to group the entries by a time interval which is where i'm struggling. Based around google searching, I have tried to resample the data using this code df.resample('5min', how=sum) Here I get the error TypeError: Only valid with DatetimeIndex, TimedeltaIndex or PeriodIndex. I was thinking about trying the groupbymethod, perhaps using lambda as in df.groupby(lambda x:x.minutes + 5) which produces the error AttributeError: 'str' object has no attribute 'minutes'.

基本上,我对a)熊猫是否具有其可以识别的格式的时间序列数据感到困惑,因为它是dtypeobject,并且b)如果它可以识别它,我似乎就不知道了缩短时间间隔.

Basically I'm a little confused as to a) whether pandas has the time-series data in a format it's recognising as it's dtype is object, and b) if it can recognize it I can't seem to get the time-intervals down.

热衷于学习是否有人能指出我正确的方向.

Keen to learn if anyone could point me in the right direction.

DF看起来像这样

        0        1                    2      3       
0          id  boat_id                 time  speed     
1      386226       32  2015-01-15 05:14:32      4.2343243      
2      386285       32  2015-01-15 05:44:57      3.45234  

推荐答案

首先,您似乎读了一个空白行.您可能要跳过文件pd.read_csv(filename, skiprows=1)中的第一行.

First, it looks like you read a blank row. You probably want to skip the first row in your file pd.read_csv(filename, skiprows=1).

您应该使用pd.to_datetime()将时间的文本表示形式转换为DatetimeIndex.

You should convert the text representation of the time into a DatetimeIndex using pd.to_datetime().

df.set_index(pd.to_datetime(df['time']), inplace=True)

然后您应该可以重新采样.

You should then be able to resample.

df.resample('15min', how=np.mean)

这篇关于使用特定时间间隔将大 pandas 时间序列数据帧分组的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆