Python pandas:将函数应用于 dataframe.rolling() [英] Python pandas: apply a function to dataframe.rolling()

查看:145
本文介绍了Python pandas:将函数应用于 dataframe.rolling()的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有这个数据框:

In[1]df = pd.DataFrame([[1,2,3,4,5],[6,7,8,9,10],[11,12,13,14,15],[16,17,18,19,20],[21,22,23,24,25]])
In[2]df
Out[2]: 
    0   1   2   3   4
0   1   2   3   4   5
1   6   7   8   9  10
2  11  12  13  14  15
3  16  17  18  19  20
4  21  22  23  24  25

我需要做到这一点:

  1. 对于我的数据框中的每一行,
  2. 如果任意 3 个连续单元格中的 2 个或多个值大于 10,
  3. 然后应该将这 3 个单元格中的最后一个标记为 True.

根据上述标准,生成的数据帧 df1 的大小应与 True 或 False 相同:

The resulting dataframe df1 should be same size with True of False in it based on the above stated criteria:

In[3]df1
Out[3]: 
    0   1      2      3      4
0 NaN NaN  False  False  False
1 NaN NaN  False  False  False
2 NaN NaN   True   True   True
3 NaN NaN   True   True   True
4 NaN NaN   True   True   True

  • df1.iloc[0,1] 是该单元格中的 NaN 原因,只给出了两个数字,但至少需要 3 个数字才能进行测试.
  • df1.iloc[1,3] 为 False,因为 [7,8,9] 中没有一个大于 10
  • df1.iloc[3,4] 为真,因为 [18,19,20] 中的 2 或更多大于 10
  • 我认为带有函数的 dataframe.rolling.apply() 可能是解决方案,但具体如何?

    I figured dataframe.rolling.apply() with a function might be the solution, but how exactly?

    推荐答案

    使用 rolling() 是正确的方法.但是,您必须记住,因为 rolling() 将窗口末尾的值替换为新值,因此您不能仅使用 True 标记窗口当条件不适用时,也会得到 False

    You are right that using rolling() is the way to go. However, you must keep in mind since rolling() replaces the value at end of the window with the new value, so you can not just mark the window with True you will also get False whenever the condition is not applicable

    以下是使用示例数据框并执行所需转换的代码:

    Here is the code that uses your sample dataframe and performs the desired transformation:

    df = pd.DataFrame([[1,2,3,4,5],[6,7,8,9,10],[11,12,13,14,15],[16,17,18,19,20],[21,22,23,24,25]])
    

    现在,定义一个函数,以窗口为参数并返回是否满足条件

    now, defining a function that takes a window as an argument and returns whether the condition is satisfied

    def fun(x):
        num = 0
        for i in x:
            num += 1 if i > 10 else 0
        return 1 if num >= 2 else -1
    

    我已将阈值硬编码为 10.因此,如果在任何窗口中大于或等于 2 的值的数量大于或等于 2,则最后一个值将替换为 1(表示 True),否则替换为 -1(表示错误).

    I have hardcoded the threshold as 10. So if in any window the numbers of values greater than 10 are greater than or equal to 2 than the last value is replaced by 1 (denoting True), else it is replaced by -1(denoting False).

    如果您想将阈值参数保留为变量,请查看 this 答案以将它们作为参数传递.

    If you want to keep threshold parameters as variables, then have a look at this answer to pass them as arguments.

    现在在滚动窗口上应用该函数,使用窗口大小为 3,轴 1,此外,如果您不想要 NaN,那么您还可以在参数中将 min_periods 设置为 1.

    Now applying the function on rolling window, using window size as 3, axis 1 and additionally if you don't want NaN then you can also set min_periods to 1 in the arguments.

    df.rolling(3, axis=1).apply(fun)
    

    产生输出为

      0   1    2    3    4
    0 NaN NaN -1.0 -1.0 -1.0
    1 NaN NaN -1.0 -1.0 -1.0
    2 NaN NaN  1.0  1.0  1.0
    3 NaN NaN  1.0  1.0  1.0
    4 NaN NaN  1.0  1.0  1.0
    

    这篇关于Python pandas:将函数应用于 dataframe.rolling()的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆