pandas 适用,但可访问先前计算的值 [英] Pandas apply, but access previously calculated value

查看:79
本文介绍了 pandas 适用,但可访问先前计算的值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

假设我有一个这样的DataFrame(或Series):

Suppose I have a DataFrame (or Series) like this:

     Value
0    0.5
1    0.8
2    -0.2
3    None
4    None
5    None

我希望创建一个新的结果列.

I wish to create a new Result column.

每个结果的值由上一个值通过任意函数f确定.

The value of each result is determined by the previous Value, via an arbitrary function f.

如果先前的值不可用(无或NaN),我希望改用先前的结果(当然,对它应用f).

If the previous Value is not available (None or NaN), I wish to use instead the previous Result (and apply f to it, of course).

使用上一个值很容易,我只需要使用shift.但是,访问以前的结果似乎并不那么简单.

Using the previous Value is easy, I just need to use shift. However, accessing the previous result doesn't seem to be that simple.

例如,以下代码计算结果,但是如果需要,则无法访问前一个结果.

For example, the following code calculates the result, but cannot access the previous result if needed.

df['Result'] = df['Value'].shift(1).apply(f)

请假定f是任意的,因此不可能使用cumsum之类的解决方案.

Please assume that f is arbitrary, and thus solutions using things like cumsum are not possible.

显然,这可以通过迭代来完成,但是我想知道是否存在更多的Panda-y解决方案.

Obviously, this can be done by iteration, but I want to know if a more Panda-y solution exists.

df['Result'] = None
for i in range(1, len(df)):
  value = df.iloc[i-1, 'Value']
  if math.isnan(value) or value is None:
    value = df.iloc[i-1, 'Result']
  df.iloc[i, 'Result'] = f(value)


示例输出,给出f = lambda x: x+1:

坏:

   Value    Result
0    0.5       NaN
1    0.8       1.5
2   -0.2       1.8
3    NaN       0.8
4    NaN       NaN
5    NaN       NaN

好:

   Value    Result
0    0.5       NaN
1    0.8       1.5
2   -0.2       1.8
3    NaN       0.8
4    NaN       1.8   <-- previous Value not available, used f(previous result)
5    NaN       2.8   <-- same

推荐答案

好像对我来说是一个循环.而且我讨厌循环...所以当我循环时,我使用numba

Looks like it has to be a loop to me. And I abhor loops... so when I loop, I use numba

Numba使您能够使用直接用Python编写的高性能函数来加速您的应用程序.通过一些注释,可以将面向数组且数学运算繁重的Python代码及时编译为本机指令,其性能与C,C ++和Fortran相似,而无需切换语言或Python解释器.

Numba gives you the power to speed up your applications with high performance functions written directly in Python. With a few annotations, array-oriented and math-heavy Python code can be just-in-time compiled to native machine instructions, similar in performance to C, C++ and Fortran, without having to switch languages or Python interpreters.

https://numba.pydata.org/

from numba import njit


@njit
def f(x):
    return x + 1

@njit
def g(a):
    r = [np.nan]
    for v in a[:-1]:
        if np.isnan(v):
            r.append(f(r[-1]))
        else:
            r.append(f(v))
    return r

df.assign(Result=g(df.Value.values))

   Value  Result
0    0.5     NaN
1    0.8     1.5
2   -0.2     1.8
3    NaN     0.8
4    NaN     1.8
5    NaN     2.8

这篇关于 pandas 适用,但可访问先前计算的值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆