在滚动窗口中取第一个和最后一个值 [英] Taking first and last value in a rolling window
本文介绍了在滚动窗口中取第一个和最后一个值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
使用pandas,我想应用可用于resample() 但不适用于rolling() 的函数.
Using pandas, I would like to apply function available for resample() but not for rolling().
这有效:
df1 = df.resample(to_freq,
closed='left',
kind='period',
).agg(OrderedDict([('Open', 'first'),
('Close', 'last'),
]))
这不会:
df2 = df.rolling(my_indexer).agg(
OrderedDict([('Open', 'first'),
('Close', 'last') ]))
>>> AttributeError: 'first' is not a valid function for 'Rolling' object
df3 = df.rolling(my_indexer).agg(
OrderedDict([
('Close', 'last') ]))
>>> AttributeError: 'last' is not a valid function for 'Rolling' object
对于将滚动窗口的第一个和最后一个值保留在两个不同的列中,您有什么建议?
What would be your advice to keep first and last value of a rolling windows to be put into two different columns?
import pandas as pd
from random import seed
from random import randint
from collections import OrderedDict
# DataFrame
ts_1h = pd.date_range(start='2020-01-01 00:00+00:00', end='2020-01-02 00:00+00:00', freq='1h')
seed(1)
values = [randint(0,10) for ts in ts_1h]
df = pd.DataFrame({'Values' : values}, index=ts_1h)
# First & last work with resample
resampled_first = df.resample('3H',
closed='left',
kind='period',
).agg(OrderedDict([('Values', 'first')]))
resampled_last = df.resample('3H',
closed='left',
kind='period',
).agg(OrderedDict([('Values', 'last')]))
# They don't with rolling
rolling_first = df.rolling(3).agg(OrderedDict([('Values', 'first')]))
rolling_first = df.rolling(3).agg(OrderedDict([('Values', 'last')]))
感谢您的帮助!最好的,
Thanks for your help! Bests,
推荐答案
你可以使用自己的函数获取滚动窗口中的第一个或最后一个元素
You can use own function to get first or last element in rolling window
rolling_first = df.rolling(3).agg(lambda rows: rows[0])
rolling_last = df.rolling(3).agg(lambda rows: rows[-1])
<小时>
示例
import pandas as pd
from random import seed, randint
# DataFrame
ts_1h = pd.date_range(start='2020-01-01 00:00+00:00', end='2020-01-02 00:00+00:00', freq='1h')
seed(1)
values = [randint(0, 10) for ts in ts_1h]
df = pd.DataFrame({'Values' : values}, index=ts_1h)
df['first'] = df['Values'].rolling(3).agg(lambda rows: rows[0])
df['last'] = df['Values'].rolling(3).agg(lambda rows: rows[-1])
print(df)
结果
Values first last
2020-01-01 00:00:00+00:00 2 NaN NaN
2020-01-01 01:00:00+00:00 9 NaN NaN
2020-01-01 02:00:00+00:00 1 2.0 1.0
2020-01-01 03:00:00+00:00 4 9.0 4.0
2020-01-01 04:00:00+00:00 1 1.0 1.0
2020-01-01 05:00:00+00:00 7 4.0 7.0
2020-01-01 06:00:00+00:00 7 1.0 7.0
2020-01-01 07:00:00+00:00 7 7.0 7.0
2020-01-01 08:00:00+00:00 10 7.0 10.0
2020-01-01 09:00:00+00:00 6 7.0 6.0
2020-01-01 10:00:00+00:00 3 10.0 3.0
2020-01-01 11:00:00+00:00 1 6.0 1.0
2020-01-01 12:00:00+00:00 7 3.0 7.0
2020-01-01 13:00:00+00:00 0 1.0 0.0
2020-01-01 14:00:00+00:00 6 7.0 6.0
2020-01-01 15:00:00+00:00 6 0.0 6.0
2020-01-01 16:00:00+00:00 9 6.0 9.0
2020-01-01 17:00:00+00:00 0 6.0 0.0
2020-01-01 18:00:00+00:00 7 9.0 7.0
2020-01-01 19:00:00+00:00 4 0.0 4.0
2020-01-01 20:00:00+00:00 3 7.0 3.0
2020-01-01 21:00:00+00:00 9 4.0 9.0
2020-01-01 22:00:00+00:00 1 3.0 1.0
2020-01-01 23:00:00+00:00 5 9.0 5.0
2020-01-02 00:00:00+00:00 0 1.0 0.0
<小时>
使用字典你必须直接输入lambda
,而不是字符串
Using dictionary you have to put directly lambda
, not string
result = df['Values'].rolling(3).agg({'first': lambda rows: rows[0], 'last': lambda rows: rows[-1]})
print(result)
和自己的函数一样——你必须输入它的名字,而不是带有名字的字符串
The same with own function - you have to put its name, not string with name
def first(rows):
return rows[0]
def last(rows):
return rows[-1]
result = df['Values'].rolling(3).agg({'first': first, 'last': last})
print(result)
<小时>
示例
import pandas as pd
from random import seed, randint
# DataFrame
ts_1h = pd.date_range(start='2020-01-01 00:00+00:00', end='2020-01-02 00:00+00:00', freq='1h')
seed(1)
values = [randint(0, 10) for ts in ts_1h]
df = pd.DataFrame({'Values' : values}, index=ts_1h)
result = df['Values'].rolling(3).agg({'first': lambda rows: rows[0], 'last': lambda rows: rows[-1]})
print(result)
def first(rows):
return rows[0]
def mylast(rows):
return rows[-1]
result = df['Values'].rolling(3).agg({'first': first, 'last': last})
print(result)
这篇关于在滚动窗口中取第一个和最后一个值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文