在保持价值关联的同时,重新取样 pandas [英] Resampling in Pandas while keeping value associations

查看:164
本文介绍了在保持价值关联的同时,重新取样 pandas 的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

从这样开始:

from pandas import DataFrame
time = np.array(('2015-08-01T00:00:00','2015-08-01T12:00:00'),dtype='datetime64[ns]')
heat_index = np.array([101,103])
air_temperature = np.array([96,95])

df = DataFrame({'heat_index':heat_index,'air_temperature':air_temperature},index=time)

生成 df

                     air_temperature    heat_index
2015-08-01 07:00:00  96                 101
2015-08-01 19:00:00  95                 103

然后每日重新采样:

df_daily = df.resample('24H',how='max')

得到这个 df_daily

            air_temperature     heat_index
2015-08-01  96                  103

所以通过使用 how ='max'大熊猫每24小时重新采样一次,从每列中取最大值。

So by resampling using how='max' pandas resamples each 24 hour period, taking the maximum value within that period from each column.

但是您可以看到 df 输出 2015-08-01 ,当天的最大热指数(发生在 19:00:00 )与发生的空气温度不相关。也就是说,103F的热指数是在空气温度为95°F的情况下引起的。这个关联通过重采样丢失,我们最终从一天的不同部分看空气温度。

But as you can see looking at df output for 2015-08-01, that day's maximum heat index (which occurs at 19:00:00) does not correlate with air temperature occurred at the same time. That is, the heat index of 103F was caused with an air temperature of 95F. This association is lost through resampling, and we end up looking at the air temperature from a different part of the day.

有没有办法重新取样一列,保留同一索引的另一列中的值?所以最后的结果如下所示:

Is there a way to resample just one column, and preserve the value in another column at the same index? So that the final outcome would look like this:

            air_temperature     heat_index
2015-08-01  95                  103

我的第一个猜测是只重新采样 heat_index 列。 ..

My first guess is to just resample the heat_index column...

df_daily = df.resample('24H',how={'heat_index':'max'})

获取...

            air_temperature
2015-08-01  103

...和然后尝试从那里做一些DataFrame.loc或DataFrame.ix,但是一直没有成功。关于如何在重新采样之后找到相关值的任何想法(例如,找到与以后发现的$ code同时发生的 air_temperature > heat_index )?

...and then trying to do some sort of DataFrame.loc or DataFrame.ix from there, but have been unsuccessful. Any thoughts on how to find the related value after resampling (e.g. to find the air_temperature that occurred at the same time as what is later found to be the maximum heat_index)?

推荐答案

这里有一种方法 - .groupby TimeGrouper())基本上是 resample 正在做的事情,然后聚合函数将每个组过滤到最大观察值。

Here's one way - the .groupby(TimeGrouper()) is essentially what resample is doing, then the aggregation function filters each group to the max observation.

In [60]: (df.groupby(pd.TimeGrouper('24H'))
            .agg(lambda df: df.loc[df['heat_index'].idxmax(), :]))

Out[60]: 
            air_temperature  heat_index
2015-08-01               95         103

这篇关于在保持价值关联的同时,重新取样 pandas 的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆