按数据帧分组,按值小于一秒 - pandas [英] group by a dataframe by values that are just less than a second off - pandas

查看:92
本文介绍了按数据帧分组,按值小于一秒 - pandas的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

假设我有一个熊猫数据框,如下所示:

<预><代码>>>>df=pd.DataFrame({'dt':pd.to_datetime(['2018-12-10 16:35:34.246','2018-12-10 16:36:34.243','2018-12-10 16:38:34.216','2018-12-10 16:42:34.123']),'值':[1,2,3,4]})>>>dfdt值0 2018-12-10 16:35:34.246 11 2018-12-10 16:36:34.243 22 2018-12-10 16:38:34.216 33 2018-12-10 16:42:34.123 4>>>

我想通过 'dt' 列对这个数据框进行分组,但我想以这样一种方式对其进行分组,即在分组后,它认为相差不到一秒的值相同我想根据每个组总结 'value' 列,并且我希望数据帧两个保持相同的长度,因此小于一秒的差异值将都是重复值,我到目前为止尝试过:

<预><代码>>>>df.groupby('dt',as_index=False)['value'].sum()dt值0 2018-12-10 16:35:34.246 11 2018-12-10 16:36:34.243 22 2018-12-10 16:38:34.216 33 2018-12-10 16:42:34.123 4>>>

但如您所见,数据框没有改变,因为它按等效的 'dt' 列值分组.

我想要的输出是:

 dt 值0 2018-12-10 16:35:34.246 31 2018-12-10 16:36:34.243 32 2018-12-10 16:38:34.216 33 2018-12-10 16:42:34.123 4

解决方案

一个蛮力的解决方案是取你的 datetime 系列和每个 datetime 值之间的绝对差异,然后与阈值进行比较:

# 来自@StephenCowley 的数据阈值 = pd.Timedelta(seconds=1)df['val'] = [df.loc[(df['dt'] - t).abs() <阈值,'值'].sum()对于 df['dt']] 中的 t打印(df)dt 值 val0 2018-12-10 16:35:34.246 1 31 2018-12-10 16:35:34.243 2 32 2018-12-10 16:38:34.216 3 33 2018-12-10 16:42:34.123 4 4

Let's say i have a pandas dataframe as below:

>>> df=pd.DataFrame({'dt':pd.to_datetime(['2018-12-10 16:35:34.246','2018-12-10 16:36:34.243','2018-12-10 16:38:34.216','2018-12-10 16:42:34.123']),'value':[1,2,3,4]})
>>> df
                       dt  value
0 2018-12-10 16:35:34.246      1
1 2018-12-10 16:36:34.243      2
2 2018-12-10 16:38:34.216      3
3 2018-12-10 16:42:34.123      4
>>> 

I would like to group this dataframe by 'dt' column, but i want to group it in a way that it thinks the values that are less than a second different are the same, after grouping those i would like to sum up the 'value' column based on each group, and i want the dataframe two remain the same length, so the less than one second difference values would be all a duplicate value, i so far tried:

>>> df.groupby('dt',as_index=False)['value'].sum()
                       dt  value
0 2018-12-10 16:35:34.246      1
1 2018-12-10 16:36:34.243      2
2 2018-12-10 16:38:34.216      3
3 2018-12-10 16:42:34.123      4
>>> 

But as you see, the dataframe didn't change because this groups by equivalent 'dt' column values.

My desired output is:

                       dt  value
0 2018-12-10 16:35:34.246      3
1 2018-12-10 16:36:34.243      3
2 2018-12-10 16:38:34.216      3
3 2018-12-10 16:42:34.123      4

解决方案

A brute force solution is to take the absolute difference between your datetime series and each datetime value, then compare against a threshold:

# data from @StephenCowley

threshold = pd.Timedelta(seconds=1)

df['val'] = [df.loc[(df['dt'] - t).abs() < threshold, 'value'].sum()
             for t in df['dt']]

print(df)

                       dt  value  val
0 2018-12-10 16:35:34.246      1    3
1 2018-12-10 16:35:34.243      2    3
2 2018-12-10 16:38:34.216      3    3
3 2018-12-10 16:42:34.123      4    4

这篇关于按数据帧分组,按值小于一秒 - pandas的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆