为 pandas DataFrame滚动idxmin/max [英] Rolling idxmin/max for pandas DataFrame

查看：87 发布时间：2021/5/15 21:05:30 python pandas dataframe indexing rolling-computation

本文介绍了为 pandas DataFrame滚动idxmin/max的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我相信以下函数是熊猫DataFrame滚动argmin/max的有效解决方案:

I believe the following function is a working solution for pandas DataFrame rolling argmin/max:

import numpy as np

def data_frame_rolling_arg_func(df, window_size, func):
    ws = window_size
    wm1 = window_size - 1
    return (df.rolling(ws).apply(getattr(np, f'arg{func}'))[wm1:].astype(int) +
            np.array([np.arange(len(df) - wm1)]).T).applymap(
                lambda x: df.index[x]).combine_first(df.applymap(lambda x: np.NaN))

它的灵感来自在熊猫系列上滚动idxmax的部分解决方案.

说明:

将numpy argmin/max函数应用于滚动窗口.
仅保留非NaN值.
将值转换为 int .
将值重新调整为原始行号.
使用 applymap 将行号替换为索引值.
将原始 DataFrame 填充为 NaN ，以便添加具有预期的 NaN 值的第一行.

Apply the numpy argmin/max function to the rolling window.
Only keep the non-NaN values.
Convert the values to int.
Realign the values to original row numbers.
Use applymap to replace the row numbers by the index values.
Combine with the original DataFrame filled with NaN in order to add the first rows with expected NaN values.


In [1]: index = map(chr, range(ord('a'), ord('a') + 10))

In [2]: df = pd.DataFrame((10 * np.random.randn(10, 3)).astype(int), index=index)

In [3]: df                                                                                                                                                                                                                                                                       
Out[3]: 
    0   1   2
a  -4  15   0
b   0  -6   4
c   7   8 -18
d  11  12 -16
e   6   3  -6
f  -1   4  -9
g   6 -10  -7
h   8  11 -25
i  -2 -10  -8
j   0  10  -7

In [4]: data_frame_rolling_arg_func(df, 3, 'max')                                                                                                                                                                                                                                
Out[4]: 
     0    1    2
a  NaN  NaN  NaN
b  NaN  NaN  NaN
c    c    a    b
d    d    d    b
e    d    d    e
f    d    d    e
g    e    f    e
h    h    h    g
i    h    h    g
j    h    h    j

In [5]: data_frame_rolling_arg_func(df, 3, 'min')                                                                                                                                                                                                                                
Out[5]: 
     0    1    2
a  NaN  NaN  NaN
b  NaN  NaN  NaN
c    a    b    c
d    b    b    c
e    e    e    c
f    f    e    d
g    f    g    f
h    f    g    h
i    i    g    h
j    i    i    h

我的问题是:

您能找到任何错误吗?
是否有更好的解决方案?那就是:更高效和/或更优雅.

对于在那里的熊猫维护者:如果已经很好的熊猫库包含滚动的idxmax和idxmin，那就太好了.

And for pandas maintainers out there: it would be nice if the already great pandas library included rolling idxmax and idxmin.

推荐答案

我上一个答案中的解决方案设法为 NaN 输入值提供适当的索引值，但是我已经意识到这很可能是不是默认情况下滚动 idxmin / idxmax 的本地大熊猫的行为.默认情况下，如果窗口中存在一个或多个 NaN 值，它将生成一个 NaN 值.

The solution in my previous answer manages to give proper index values for NaN input values, but I have realized that this is most probably not what a native pandas rolling idxmin/idxmax would do by default. By default, it would produce a NaN value if there is one or more NaN values in the window.

我想出了我的解决方案的一种变体，它可以做到:

I came up with a variant of my solution, which does that:

import numpy as np
import pandas as pd


def transform_if_possible(func):
    def f(i):
        try:
            return func(i)
        except ValueError:
            return i
    return f


int_if_possible = transform_if_possible(int)


def data_frame_rolling_idx_func(df, window_size, func):
    ws = window_size
    wm1 = window_size - 1

    index_if_possible = transform_if_possible(lambda i: df.index[i])

    return (df.rolling(ws).apply(getattr(np, f'arg{func}'), raw=True).applymap(int_if_possible) +
            np.array([np.arange(len(df)) - wm1]).T).applymap(index_if_possible)


def main():
    print(int_if_possible(1.2))
    print(int_if_possible(np.NaN))
    index = map(chr, range(ord('a'), ord('a') + 10))
    df = pd.DataFrame((10 * np.random.randn(10, 3)).astype(int), index=index)
    df[0][3:6] = np.NaN
    print(df)
    print(data_frame_rolling_idx_func(df, 3, 'min'))
    print(data_frame_rolling_idx_func(df, 3, 'max'))


if __name__ == "__main__":
    main()

结果:

1
nan
      0   1   2
a  15.0  -2  13
b  -6.0  -4  -3
c -12.0  -7  -8
d   NaN   0  -4
e   NaN  -1 -11
f   NaN  -9  10
g  -1.0  24   1
h -15.0  14 -16
i   7.0  -4  14
j  -1.0   4  10
     0    1    2
a  NaN  NaN  NaN
b  NaN  NaN  NaN
c    c    c    c
d  NaN    c    c
e  NaN    c    e
f  NaN    f    e
g  NaN    f    e
h  NaN    f    h
i    h    i    h
j    h    i    h
     0    1    2
a  NaN  NaN  NaN
b  NaN  NaN  NaN
c    a    a    a
d  NaN    d    b
e  NaN    d    d
f  NaN    d    f
g  NaN    g    f
h  NaN    g    f
i    i    g    i
j    i    h    i

为了实现我的目标，我正在使用两个函数分别将值转换为整数，将行号转换为索引值，这使 NaN 保持不变.我借助通用闭包 transform_if_possible 构造这些函数.在第二种情况下，由于索引转换依赖于 DataFrame ，所以我从本地lambda函数构造了转换函数.

To achieve my goal, I am using two functions to transform values into integers, and row numbers into index values, respectively, which leave NaN unchanged. I construct these functions with the help of a common closure, transform_if_possible. In the second case, since the index transformation is dependent on the DataFrame, I construct the transformation function from a local lambda function.

除了这些方面之外，该解决方案类似于我的上一个，但是由于 NaN >是显式处理的，我知道不再需要对前 window_size-1 行进行特殊处理，因此代码要短一些.

Apart from these aspects, the solution is similar to my previous one, but since NaN is explicitly handled, I know longer need a special handling of the first window_size - 1 rows, so the code is a little shorter.

此解决方案的一个不错的副作用是运行时间似乎更短:对应滚动 min / max 的运行时间的三倍多，而不是五次.

A nice side effect of this solution is that the running time seems to be lower: a little over three times the running time of the corresponding rolling min/max, instead of five times.

总而言之，我认为这是一个更好的解决方案.

All in all, a better solution I think.

这篇关于为 pandas DataFrame滚动idxmin/max的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

为 pandas DataFrame滚动idxmin/max [英] Rolling idxmin/max for pandas DataFrame

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

为 pandas DataFrame滚动idxmin/max [英] Rolling idxmin/max for pandas DataFrame

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭