在Python pandas 中查找每日最大值及其时间戳(yyyy:mm:dd hh:mm:ss) [英] Finding daily maximum and its time-stamp (yyyy:mm:dd hh:mm:ss) in Python Pandas

查看:639
本文介绍了在Python pandas 中查找每日最大值及其时间戳(yyyy:mm:dd hh:mm:ss)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

实际上,我有150 MB的数据,每天进行两年的逐分钟测量.我在这里给出了示例数据.我想创建一个新的数据框,其中包含每天最多的时间戳. 我的样本数据是:

I actually have a 150 MB data of daily minute-wise measurements for everyday for two years. I have given here a sample data. I want to create a new data frame with maximum of each day with its time-stamp. My sample data is:

    DateTime            Power
01-Aug-16 10:43:00.000  229.9607961
01-Aug-16 10:43:23.000  230.9030781
01-Aug-16 10:44:00.000  231.716212
01-Aug-16 10:45:00.000  232.4485882
01-Aug-16 10:46:00.000  233.2739154
02-Aug-16 09:42:00.000  229.6851724
02-Aug-16 09:43:00.000  230.9163998
02-Aug-16 09:43:06.000  230.9883337
02-Aug-16 09:44:00.000  231.2569098
02-Aug-16 09:49:00.000  229.5774805
02-Aug-16 09:50:00.000  229.8758693
02-Aug-16 09:51:00.000  229.9825204
03-Aug-16 10:09:00.000  231.3605982
03-Aug-16 10:10:00.000  231.6827163
03-Aug-16 10:11:00.000  231.1580262
03-Aug-16 10:12:00.000  230.4054286
03-Aug-16 10:13:00.000  229.6507959
03-Aug-16 10:13:02.000  229.6268353
03-Aug-16 10:14:00.000  230.4584964
03-Aug-16 10:15:00.000  230.9004206
03-Aug-16 10:16:00.000  231.189036

我现在的代码是:

max_per_day = df.groupby(pd.Grouper(key='time',freq='D')).max()
print(max_per_day)

我目前的输出是:

    time                  
2016-08-01  237.243835
2016-08-02  239.658539
2016-08-03  237.424683
2016-08-04  236.790695
2016-08-05  240.163910

当前它输出yyyy:mm:dd和值.但我甚至希望hh:mm(或hh:mm:ss)反对每个最大值.我尝试了以下代码:

Presently it outputs yyyy:mm:dd and value. But I want even hh:mm (or hh:mm:ss) against each maximum value. I tried following code:

max_pmpp_day = df.loc[df.groupby(pd.Grouper(freq='D')).idxmax().iloc[:,0]]

输出为:

 TypeError: Only valid with DatetimeIndex, TimedeltaIndex or PeriodIndex, but got an instance of 'Int64Index'

我尝试了@jezrael答案

I tried @jezrael answer

df['DateTime'] = pd.to_datetime(df['time'])
s = df.groupby(pd.Grouper(key='DateTime', freq='D'))['Pmpp'].transform('max')
df = df[df['Pmpp'] == s]    
print(df)

输出为

                     time        Pmpp            DateTime
34    2016-08-01 11:11:00  237.243835 2016-08-01 11:11:00
434   2016-08-02 13:30:02  239.658539 2016-08-02 13:30:02
648   2016-08-03 12:39:00  237.424683 2016-08-03 12:39:00

推荐答案

您可以使用

You can use GroupBy.transform or Resampler.transform for return max values in new Series and compare with original column:

df['DateTime'] = pd.to_datetime(df['DateTime'])
s = df.groupby(pd.Grouper(key='DateTime', freq='D'))['Power'].transform('max')
#alternative
#s = df.resample('D', on='DateTime')['Power'].transform('max')
df = df[df['Power'] == s]
print (df)
              DateTime       Power
4  2016-08-01 10:46:00  233.273915
8  2016-08-02 09:44:00  231.256910
13 2016-08-03 10:10:00  231.682716

或创建DatetimeIndex并在groupby之后添加列以进行检查idxmax:

Or create DatetimeIndex and add column after groupby for check idxmax:

df['DateTime'] = pd.to_datetime(df['DateTime'])
df = df.set_index('DateTime')
df = df.loc[df.groupby(pd.Grouper(freq='D'))['Power'].idxmax()]
print (df)
                          Power
DateTime                       
2016-08-01 10:46:00  233.273915
2016-08-02 09:44:00  231.256910
2016-08-03 10:10:00  231.682716

@Jon Clements的解决方案,谢谢:

Solution of @Jon Clements, thank you:

df = (df.sort_values('Power')
        .groupby(df.DateTime.dt.to_period('D'))
        .last()
        .reset_index(drop=True))

这篇关于在Python pandas 中查找每日最大值及其时间戳(yyyy:mm:dd hh:mm:ss)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆