在滚动窗口中取第一个和最后一个值 [英] Taking first and last value in a rolling window

查看:93
本文介绍了在滚动窗口中取第一个和最后一个值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

使用pandas,我想应用可用于resample() 但不适用于rolling() 的函数.

Using pandas, I would like to apply function available for resample() but not for rolling().

这有效:

df1 = df.resample(to_freq,
                  closed='left',
                  kind='period',
                   ).agg(OrderedDict([('Open', 'first'),
                                      ('Close', 'last'),
                                                        ]))

这不会:

df2 = df.rolling(my_indexer).agg(
                 OrderedDict([('Open', 'first'),
                              ('Close', 'last') ]))
>>> AttributeError: 'first' is not a valid function for 'Rolling' object

df3 = df.rolling(my_indexer).agg(
                 OrderedDict([
                              ('Close', 'last') ]))
>>> AttributeError: 'last' is not a valid function for 'Rolling' object

对于将滚动窗口的第一个和最后一个值保留在两个不同的列中,您有什么建议?

What would be your advice to keep first and last value of a rolling windows to be put into two different columns?

import pandas as pd
from random import seed
from random import randint
from collections import OrderedDict

# DataFrame
ts_1h = pd.date_range(start='2020-01-01 00:00+00:00', end='2020-01-02 00:00+00:00', freq='1h')
seed(1)
values = [randint(0,10) for ts in ts_1h]
df = pd.DataFrame({'Values' : values}, index=ts_1h)

# First & last work with resample
resampled_first = df.resample('3H',
                              closed='left',
                              kind='period',
                             ).agg(OrderedDict([('Values', 'first')]))
resampled_last = df.resample('3H',
                             closed='left',
                             kind='period',
                            ).agg(OrderedDict([('Values', 'last')]))

# They don't with rolling
rolling_first = df.rolling(3).agg(OrderedDict([('Values', 'first')]))
rolling_first = df.rolling(3).agg(OrderedDict([('Values', 'last')]))

感谢您的帮助!最好的,

Thanks for your help! Bests,

推荐答案

你可以使用自己的函数获取滚动窗口中的第一个或最后一个元素

You can use own function to get first or last element in rolling window

rolling_first = df.rolling(3).agg(lambda rows: rows[0])
rolling_last  = df.rolling(3).agg(lambda rows: rows[-1])

<小时>

示例

import pandas as pd
from random import seed, randint

# DataFrame
ts_1h = pd.date_range(start='2020-01-01 00:00+00:00', end='2020-01-02 00:00+00:00', freq='1h')

seed(1)
values = [randint(0, 10) for ts in ts_1h]

df = pd.DataFrame({'Values' : values}, index=ts_1h)

df['first'] = df['Values'].rolling(3).agg(lambda rows: rows[0])
df['last']  = df['Values'].rolling(3).agg(lambda rows: rows[-1])

print(df)

结果

                          Values  first  last
2020-01-01 00:00:00+00:00       2    NaN   NaN
2020-01-01 01:00:00+00:00       9    NaN   NaN
2020-01-01 02:00:00+00:00       1    2.0   1.0
2020-01-01 03:00:00+00:00       4    9.0   4.0
2020-01-01 04:00:00+00:00       1    1.0   1.0
2020-01-01 05:00:00+00:00       7    4.0   7.0
2020-01-01 06:00:00+00:00       7    1.0   7.0
2020-01-01 07:00:00+00:00       7    7.0   7.0
2020-01-01 08:00:00+00:00      10    7.0  10.0
2020-01-01 09:00:00+00:00       6    7.0   6.0
2020-01-01 10:00:00+00:00       3   10.0   3.0
2020-01-01 11:00:00+00:00       1    6.0   1.0
2020-01-01 12:00:00+00:00       7    3.0   7.0
2020-01-01 13:00:00+00:00       0    1.0   0.0
2020-01-01 14:00:00+00:00       6    7.0   6.0
2020-01-01 15:00:00+00:00       6    0.0   6.0
2020-01-01 16:00:00+00:00       9    6.0   9.0
2020-01-01 17:00:00+00:00       0    6.0   0.0
2020-01-01 18:00:00+00:00       7    9.0   7.0
2020-01-01 19:00:00+00:00       4    0.0   4.0
2020-01-01 20:00:00+00:00       3    7.0   3.0
2020-01-01 21:00:00+00:00       9    4.0   9.0
2020-01-01 22:00:00+00:00       1    3.0   1.0
2020-01-01 23:00:00+00:00       5    9.0   5.0
2020-01-02 00:00:00+00:00       0    1.0   0.0

<小时>

使用字典你必须直接输入lambda,而不是字符串

Using dictionary you have to put directly lambda, not string

result = df['Values'].rolling(3).agg({'first': lambda rows: rows[0], 'last':  lambda rows: rows[-1]})
print(result)

和自己的函数一样——你必须输入它的名字,而不是带有名字的字符串

The same with own function - you have to put its name, not string with name

def first(rows):
    return rows[0]

def last(rows):
    return rows[-1]

result = df['Values'].rolling(3).agg({'first': first, 'last': last})
print(result)

<小时>

示例

import pandas as pd
from random import seed, randint

# DataFrame
ts_1h = pd.date_range(start='2020-01-01 00:00+00:00', end='2020-01-02 00:00+00:00', freq='1h')

seed(1)
values = [randint(0, 10) for ts in ts_1h]

df = pd.DataFrame({'Values' : values}, index=ts_1h)

result = df['Values'].rolling(3).agg({'first': lambda rows: rows[0], 'last': lambda rows: rows[-1]})
print(result)

def first(rows):
    return rows[0]

def mylast(rows):
    return rows[-1]

result = df['Values'].rolling(3).agg({'first': first, 'last': last})
print(result)

这篇关于在滚动窗口中取第一个和最后一个值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆