将数据框分割成多个5秒的数据框,并在Python中获取计数 [英] Spliting a dataframe into multiple 5-second dataframes and obtaining count in Python

查看:169
本文介绍了将数据框分割成多个5秒的数据框,并在Python中获取计数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个相对较大的数据集,我想根据包含datetime对象的列在 Python 中拆分为多个数据框。这个列中的值(我想分割数据框)用以下格式给出:


  1. 2015-11-01 00:00:05




  • 第一列表示分割的组(该列的值无关紧要:它们可以简单地为1,2,3 ...,表示5秒间隔的顺序,例如1可以指期间2015-11-01 00:00:00 - 2015-11-01 00: 00:05,2可以参考期间2015-11-01 00:00:05 - 2015-11-01 00:00:10
    等等),
  • 第二列显示了在每个相应时间间隔内观察到的观察次数。 创建 DataFrame的字典 s并添加新的列与 assign

      rng = pd.date_range('2015-11-01 00:00:00',句点= 100,freq ='S')
    df = pd.DataFrame({'Date':rng,' a':range(100)})
    print(df.head(10))
    日期a
    0 2015-11-01 00:00:00 0
    1 2015 -11-01 00:00:01 1
    2 2015-11-01 00:00:02 2
    3 2015-11-01 00:00:03 3
    4 2015-11 -01 00:00:04 4
    5 2015-11-01 00:00:05 5
    6 2015-11-01 00:00:06 6
    7 2015-11-01 00:00:07 7
    8 2015-11-01 00:00:08 8
    9 2015-11-01 00:00:09 9

    g = df.groupby (pd.Grouper(key ='Date',freq ='5S'))

    dfs = {k.strftime('%Y-%m-%d%H:%M:%S '):v.assign(A = range(1,len(v)+1),B = len(v))for k,v in g}

    print(dfs ['2015- '))
    日期a AB
    5 2015-11-01 00:00:05 5 1 5
    6 2015-11-01 00:00: 06 6 2 5
    7 2015-11-01 00:00:07 7 3 5
    8 2015-11-01 00:00:08 8 4 5
    9 2015-11 -01 00:00:09 9 5 5

    如果需要先计算行数aggreagte size Interval 将1加到索引:

      df1 = df.groupby(pd.Grouper(key ='Date',freq ='5S'))。size()。reset_index(name ='Count')
    df1 ['Interval'] = df1。 index + 1
    print(df1.head())
    日期计数间隔
    0 2015-11-01 00:00:00 5 1
    1 2015-11-01 00 :00:05 5 2
    2 2015-11-01 00:00:10 5 3
    3 2015-11-01 00:00:15 5 4
    4 2015-11-01 00:00:20 5 5


    I have a relatively big dataset that I want to split into multiple dataframes in Python based on a column containing a datetime object. The values in the column (that I want to split the dataframe by) are given in the following format:

    1. 2015-11-01 00:00:05

    You may assume the dataframe looks like this.

    How can I split the dataframe into 5-second intervals in the following way:

    1. 1st dataframe 2015-11-01 00:00:00 - 2015-11-01 00:00:05,

    2. 2nd dataframe 2015-11-01 00:00:05 - 2015-11-01 00:00:10, and so on.

    I also need to count the number of observations in each of resulting dataframes. In other, words, it would be nice if I could get another dataframe with 2 columns (the desired output format can be found below):

    • 1st column represents the splitted group (values of this column don't matter: they could be simply 1, 2, 3,.. indicating the order of the 5-second intervals, for example, 1 could refer to the period 2015-11-01 00:00:00 - 2015-11-01 00:00:05, 2 could refer to the period 2015-11-01 00:00:05 - 2015-11-01 00:00:10 and so on),
    • 2nd column shows the number of observations falling in each respective interval.

    解决方案

    Create dictionary of DataFrames and add new column with assign:

    rng = pd.date_range('2015-11-01 00:00:00', periods=100, freq='S')
    df = pd.DataFrame({'Date': rng, 'a': range(100)})  
    print (df.head(10))
                     Date  a
    0 2015-11-01 00:00:00  0
    1 2015-11-01 00:00:01  1
    2 2015-11-01 00:00:02  2
    3 2015-11-01 00:00:03  3
    4 2015-11-01 00:00:04  4
    5 2015-11-01 00:00:05  5
    6 2015-11-01 00:00:06  6
    7 2015-11-01 00:00:07  7
    8 2015-11-01 00:00:08  8
    9 2015-11-01 00:00:09  9
    
    g = df.groupby(pd.Grouper(key='Date', freq='5S'))
    
    dfs = {k.strftime('%Y-%m-%d %H:%M:%S'):v.assign(A=range(1,len(v)+1), B=len(v)) for k,v in g}
    
    print (dfs['2015-11-01 00:00:05'])
                     Date  a  A  B
    5 2015-11-01 00:00:05  5  1  5
    6 2015-11-01 00:00:06  6  2  5
    7 2015-11-01 00:00:07  7  3  5
    8 2015-11-01 00:00:08  8  4  5
    9 2015-11-01 00:00:09  9  5  5
    

    If need count rows first aggreagte size and for Interval is add 1 to index:

    df1 = df.groupby(pd.Grouper(key='Date', freq='5S')).size().reset_index(name='Count')
    df1['Interval'] = df1.index + 1
    print (df1.head())
                     Date  Count  Interval
    0 2015-11-01 00:00:00      5         1
    1 2015-11-01 00:00:05      5         2
    2 2015-11-01 00:00:10      5         3
    3 2015-11-01 00:00:15      5         4
    4 2015-11-01 00:00:20      5         5
    

    这篇关于将数据框分割成多个5秒的数据框,并在Python中获取计数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆