从日期时间范围和以 pandas 为单位的计算列 [英] Calculated column from datetime range and group in pandas

查看:79
本文介绍了从日期时间范围和以 pandas 为单位的计算列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想计算每组每周的最大值,并用熊猫中的这些值创建一个新列.我发布类似的问题并没有解决我的问题,所以我重新构造了问题.

I want to calculate the max value per week per group and to create a new column with these values in pandas. I posted a similar question that did not solve my problem, so I restructured the question.

考虑一个带有时间戳,组和值列的数据框:

Consider a dataframe with timestamp, group and value columns:

datetime     group    value
2014-05-07   A        3 
2014-05-07   B        4 
2014-05-14   A        4 
2014-05-14   B        2 
2014-05-15   A        6 
2014-05-15   B        4 
2014-05-16   A        7 
2014-05-16   B        10

我想创建一个新的列,每个星期的最大值按组:

I want to create a new column with the maximum value per week by group:

datetime     group    value    maxval
2014-05-07   A        3        3
2014-05-07   B        4        4
2014-05-14   A        4        7
2014-05-14   B        2        10
2014-05-15   A        6        7
2014-05-15   B        4        10
2014-05-16   A        7        7
2014-05-16   B        10       10

在链接的问题中,提出的解决方案是转换groupby子句,然后将其附加到数据框,但这在系列中造成了排序错误.

In the linked question, the solution presented was to transform a groupby clause and then attach it to the dataframe, however this is creating ordering errors in the series.

推荐答案

您可以同时在group和星期上为transform组建立索引:

You can transform groups indexed on both group and the week simultaneously:

>>> week = pd.DatetimeIndex(df.datetime).week
>>> df["maxval"] = df.groupby(['group', week])["value"].transform('max')
>>> df
     datetime group  value  maxval
0  2014-05-07     A      3       3
1  2014-05-07     B      4       4
2  2014-05-14     A      4       7
3  2014-05-14     B      2      10
4  2014-05-15     A      6       7
5  2014-05-15     B      4      10
6  2014-05-16     A      7       7
7  2014-05-16     B     10      10

请注意,如果您有很多年,这会将每年的第二个星期(例如)合并到同一组中.

Note that if you have multiple years this will combine the second week (e.g.) of each year into the same group.

有时候人们会想要,但是如果您不想要,您可以用相同的方式将年份添加到分组数量中.

Sometimes people want that, but if you don't, you could add the year to the grouped quantities in the same way.

如果要改为滚动最大值,则可以使用(适当地)rolling_max.您可以自己重新采样,也可以让rolling_max进行采样,例如

If you want instead a rolling maximum, you can use (appropriately enough) rolling_max. You can either resample yourself or get rolling_max to do it, something like

def rolling_max_week(x):
    rolled = pd.rolling_max(x, 7, min_periods=1, center=True, freq='d')
    match_x = rolled.loc[x.index]
    return match_x

df["datetime"] = pd.to_datetime(df["datetime"])
df = df.set_index("datetime")
df["rolling_max"] = df.groupby("group")["value"].transform(rolling_max_week)
df["bin_max"] = df.groupby(["group", df.index.week])["value"].transform(max)

现在,这两种情况在您的样本上产生的输出完全相同:

Now as it happens, these two produce exactly the same output on your sample:

>>> df
           group  value  rolling_max  bin_max
datetime                                     
2014-05-07     A      3            3        3
2014-05-07     B      4            4        4
2014-05-14     A      4            7        7
2014-05-14     B      2           10       10
2014-05-15     A      6            7        7
2014-05-15     B      4           10       10
2014-05-16     A      7            7        7
2014-05-16     B     10           10       10

,但通常情况并非如此.您需要阅读rolling_max的文档,并使用一些测试用例,以确保我正确地解释了您想要的内容.

but that won't be true in general. You'll want to read the documentation for rolling_max and play around with some test cases to be sure that I'm interpreting what you want correctly.

这篇关于从日期时间范围和以 pandas 为单位的计算列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆