pandas 适用,但访问先前计算的值 [英] Pandas apply, but access previously calculated value

查看:27
本文介绍了 pandas 适用,但访问先前计算的值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

假设我有一个像这样的 DataFrame(或系列):

Suppose I have a DataFrame (or Series) like this:

     Value
0    0.5
1    0.8
2    -0.2
3    None
4    None
5    None

我希望创建一个新的结果列.

I wish to create a new Result column.

每个结果的值由上一个值决定,通过任意函数f.

The value of each result is determined by the previous Value, via an arbitrary function f.

如果前一个值不可用(无或 NaN),我希望使用前一个结果代替(当然,并对其应用 f).

If the previous Value is not available (None or NaN), I wish to use instead the previous Result (and apply f to it, of course).

使用前一个值很容易,我只需要使用shift.然而,访问之前的结果似乎并没有那么简单.

Using the previous Value is easy, I just need to use shift. However, accessing the previous result doesn't seem to be that simple.

例如,下面的代码计算结果,但如果需要,不能访问之前的结果.

For example, the following code calculates the result, but cannot access the previous result if needed.

df['Result'] = df['Value'].shift(1).apply(f)

请假设 f 是任意的,因此使用 cumsum 之类的解决方案是不可能的.

Please assume that f is arbitrary, and thus solutions using things like cumsum are not possible.

显然,这可以通过迭代来完成,但我想知道是否存在更像熊猫的解决方案.

Obviously, this can be done by iteration, but I want to know if a more Panda-y solution exists.

df['Result'] = None
for i in range(1, len(df)):
  value = df.iloc[i-1, 'Value']
  if math.isnan(value) or value is None:
    value = df.iloc[i-1, 'Result']
  df.iloc[i, 'Result'] = f(value)

<小时>

示例输出,给定 f = lambda x: x+1:

不好:

   Value    Result
0    0.5       NaN
1    0.8       1.5
2   -0.2       1.8
3    NaN       0.8
4    NaN       NaN
5    NaN       NaN

好:

   Value    Result
0    0.5       NaN
1    0.8       1.5
2   -0.2       1.8
3    NaN       0.8
4    NaN       1.8   <-- previous Value not available, used f(previous result)
5    NaN       2.8   <-- same

推荐答案

对我来说看起来它必须是一个循环.我讨厌循环......所以当我循环时,我使用 numba

Looks like it has to be a loop to me. And I abhor loops... so when I loop, I use numba

Numba 使您能够使用直接用 Python 编写的高性能函数来加速应用程序.只需少量注释,面向数组和数学密集型的 Python 代码就可以即时编译为本地机器指令,其性能类似于 C、C++ 和 Fortran,而无需切换语言或 Python 解释器.

Numba gives you the power to speed up your applications with high performance functions written directly in Python. With a few annotations, array-oriented and math-heavy Python code can be just-in-time compiled to native machine instructions, similar in performance to C, C++ and Fortran, without having to switch languages or Python interpreters.

https://numba.pydata.org/

from numba import njit


@njit
def f(x):
    return x + 1

@njit
def g(a):
    r = [np.nan]
    for v in a[:-1]:
        if np.isnan(v):
            r.append(f(r[-1]))
        else:
            r.append(f(v))
    return r

df.assign(Result=g(df.Value.values))

   Value  Result
0    0.5     NaN
1    0.8     1.5
2   -0.2     1.8
3    NaN     0.8
4    NaN     1.8
5    NaN     2.8

这篇关于 pandas 适用,但访问先前计算的值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆