将数据框分割成多个5秒的数据框,并在Python中获取计数 [英] Spliting a dataframe into multiple 5-second dataframes and obtaining count in Python
问题描述
我有一个相对较大的数据集,我想根据包含datetime对象的列在 Python 中拆分为多个数据框。这个列中的值(我想分割数据框)用以下格式给出:
-
2015-11-01 00:00:05
- 第一列表示分割的组(该列的值无关紧要:它们可以简单地为1,2,3 ...,表示5秒间隔的顺序,例如1可以指期间2015-11-01 00:00:00 - 2015-11-01 00: 00:05,2可以参考期间2015-11-01 00:00:05 - 2015-11-01 00:00:10
等等),
- 第二列显示了在每个相应时间间隔内观察到的观察次数。 创建
DataFrame的字典
s并添加新的列与assign
:
rng = pd.date_range('2015-11-01 00:00:00',句点= 100,freq ='S')
df = pd.DataFrame({'Date':rng,' a':range(100)})
print(df.head(10))
日期a
0 2015-11-01 00:00:00 0
1 2015 -11-01 00:00:01 1
2 2015-11-01 00:00:02 2
3 2015-11-01 00:00:03 3
4 2015-11 -01 00:00:04 4
5 2015-11-01 00:00:05 5
6 2015-11-01 00:00:06 6
7 2015-11-01 00:00:07 7
8 2015-11-01 00:00:08 8
9 2015-11-01 00:00:09 9
g = df.groupby (pd.Grouper(key ='Date',freq ='5S'))
dfs = {k.strftime('%Y-%m-%d%H:%M:%S '):v.assign(A = range(1,len(v)+1),B = len(v))for k,v in g}
print(dfs ['2015- '))
日期a AB
5 2015-11-01 00:00:05 5 1 5
6 2015-11-01 00:00: 06 6 2 5
7 2015-11-01 00:00:07 7 3 5
8 2015-11-01 00:00:08 8 4 5
9 2015-11 -01 00:00:09 9 5 5
如果需要先计算行数aggreagte
size
和Interval
将1加到索引:df1 = df.groupby(pd.Grouper(key ='Date',freq ='5S'))。size()。reset_index(name ='Count')
df1 ['Interval'] = df1。 index + 1
print(df1.head())
日期计数间隔
0 2015-11-01 00:00:00 5 1
1 2015-11-01 00 :00:05 5 2
2 2015-11-01 00:00:10 5 3
3 2015-11-01 00:00:15 5 4
4 2015-11-01 00:00:20 5 5
I have a relatively big dataset that I want to split into multiple dataframes in Python based on a column containing a datetime object. The values in the column (that I want to split the dataframe by) are given in the following format:
2015-11-01 00:00:05
You may assume the dataframe looks like this.
How can I split the dataframe into 5-second intervals in the following way:
1st dataframe
2015-11-01 00:00:00 - 2015-11-01 00:00:05
,2nd dataframe
2015-11-01 00:00:05 - 2015-11-01 00:00:10
, and so on.
I also need to count the number of observations in each of resulting dataframes. In other, words, it would be nice if I could get another dataframe with 2 columns (the desired output format can be found below):
- 1st column represents the splitted group (values of this column don't matter: they could be simply 1, 2, 3,.. indicating the order of the 5-second intervals, for example, 1 could refer to the period 2015-11-01 00:00:00 - 2015-11-01 00:00:05, 2 could refer to the period 2015-11-01 00:00:05 - 2015-11-01 00:00:10 and so on),
- 2nd column shows the number of observations falling in each respective interval.
解决方案Create
dictionary of DataFrame
s and add new column withassign
:rng = pd.date_range('2015-11-01 00:00:00', periods=100, freq='S') df = pd.DataFrame({'Date': rng, 'a': range(100)}) print (df.head(10)) Date a 0 2015-11-01 00:00:00 0 1 2015-11-01 00:00:01 1 2 2015-11-01 00:00:02 2 3 2015-11-01 00:00:03 3 4 2015-11-01 00:00:04 4 5 2015-11-01 00:00:05 5 6 2015-11-01 00:00:06 6 7 2015-11-01 00:00:07 7 8 2015-11-01 00:00:08 8 9 2015-11-01 00:00:09 9 g = df.groupby(pd.Grouper(key='Date', freq='5S')) dfs = {k.strftime('%Y-%m-%d %H:%M:%S'):v.assign(A=range(1,len(v)+1), B=len(v)) for k,v in g} print (dfs['2015-11-01 00:00:05']) Date a A B 5 2015-11-01 00:00:05 5 1 5 6 2015-11-01 00:00:06 6 2 5 7 2015-11-01 00:00:07 7 3 5 8 2015-11-01 00:00:08 8 4 5 9 2015-11-01 00:00:09 9 5 5
If need count rows first aggreagte
size
and forInterval
is add 1 to index:df1 = df.groupby(pd.Grouper(key='Date', freq='5S')).size().reset_index(name='Count') df1['Interval'] = df1.index + 1 print (df1.head()) Date Count Interval 0 2015-11-01 00:00:00 5 1 1 2015-11-01 00:00:05 5 2 2 2015-11-01 00:00:10 5 3 3 2015-11-01 00:00:15 5 4 4 2015-11-01 00:00:20 5 5
这篇关于将数据框分割成多个5秒的数据框,并在Python中获取计数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
- 第二列显示了在每个相应时间间隔内观察到的观察次数。 创建