Pandas 数据帧前向填充衰减 [英] Pandas dataframe forward-fill with decay

查看:60
本文介绍了Pandas 数据帧前向填充衰减的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在运行 Python 3.5 和 Pandas v 0.19.2.我有一个如下所示的数据框.向前填充缺失值很简单.

I am running Python 3.5, and Pandas v 0.19.2. I have a dataframe like below. Forward-filling the missing values is straight-forward.

import pandas as pd
import numpy as np

d = {'A': np.array([10, np.nan, np.nan, -3, np.nan, 4, np.nan, 0]),
     'B': np.array([np.nan, np.nan, 5, -3, np.nan, np.nan, 0, np.nan ])}
df = pd.DataFrame(d)
df_filled = df.fillna(axis='index', method='ffill')
print(df_filled)
Out[8]: 
      A    B
0  10.0  NaN
1  10.0  NaN
2  10.0  5.0
3  -3.0 -3.0
4  -3.0 -3.0
5   4.0 -3.0
6   4.0  0.0
7   0.0  0.0

我的问题是:实现前向填充衰减的最佳方法是什么?我知道 pd.ffill()pd.fillna() 不支持这个.例如,我所追求的输出如下(与上面的常规填充相反),其中每个时期的值都减半:

My question is: what is the best way to implement a forward fill with decay? I understand the pd.ffill() and pd.fillna() do not support this. For instance, the output I am after is the below (in contrast with the regular ffill above), where the value carried over halves at each period:

Out[5]: 
      A    B
0  10.0  NaN
1   5.0  NaN
2   2.5  5.0
3  -3.0 -3.0
4  -1.5 -1.5
5   4.0 -0.75
6   2.0  0.0
7   0.0  0.0

推荐答案

是的,没有简单的方法可以做到这一点.我建议一次完成这一列,使用 groupbyapply.

Yes, there's no simple way to do this. I'd recommend doing this one column at a time, using groupby and apply.

for c in df:
    df[c] = df[c].groupby(df[c].notnull().cumsum()).apply(
        lambda y: y.ffill() / 2 ** np.arange(len(y))
    )

df
      A     B
0  10.0   NaN
1   5.0   NaN
2   2.5  5.00
3  -3.0 -3.00
4  -1.5 -1.50
5   4.0 -0.75
6   2.0  0.00
7   0.0  0.00

这篇关于Pandas 数据帧前向填充衰减的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆