pandas :用之前和下一个非缺失值的平均值动态替换NaN值 [英] Pandas: Dynamically replace NaN values with the average of previous and next non-missing values

查看:90
本文介绍了 pandas :用之前和下一个非缺失值的平均值动态替换NaN值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个具有NaN值的数据框df,我想用上一个和下一个非缺失值的平均值动态替换它们.

I have a dataframe df with NaN values and I want to dynamically replace them with the average values of previous and next non-missing values.

In [27]: df 
Out[27]: 
          A         B         C
0 -0.166919  0.979728 -0.632955
1 -0.297953 -0.912674 -1.365463
2 -0.120211 -0.540679 -0.680481
3       NaN -2.027325  1.533582
4       NaN       NaN  0.461821
5 -0.788073       NaN       NaN
6 -0.916080 -0.612343       NaN
7 -0.887858  1.033826       NaN
8  1.948430  1.025011 -2.982224
9  0.019698 -0.795876 -0.046431

例如,A[3]NaN,因此其值应为(-0.120211-0.788073)/2 = -0.454142. A[4]然后应为(-0.454142-0.788073)/2 = -0.621108.

For example, A[3] is NaN so its value should be (-0.120211-0.788073)/2 = -0.454142. A[4] then should be (-0.454142-0.788073)/2 = -0.621108.

因此,结果数据框应如下所示:

Therefore, the result dataframe should look like:

In [27]: df 
Out[27]: 
          A         B         C
0 -0.166919  0.979728 -0.632955
1 -0.297953 -0.912674 -1.365463
2 -0.120211 -0.540679 -0.680481
3 -0.454142 -2.027325  1.533582
4 -0.621108 -1.319834  0.461821
5 -0.788073 -0.966089 -1.260202
6 -0.916080 -0.612343 -2.121213
7 -0.887858  1.033826 -2.551718
8  1.948430  1.025011 -2.982224
9  0.019698 -0.795876 -0.046431

这是处理缺失值的好方法吗?我不能简单地用每列的平均值替换它们因为我的数据是时间序列的,并且会随着时间的流逝而增加. (初始值可能是$ 0,最终值可能是$ 100000,所以平均值是$ 50000,可能比NaN值大/小).

Is this a good way to deal with the missing values? I can't simply replace them by the average values of each column because my data is time-series and tends to increase over time. (The initial value may be $0 and final value might be $100000, so the average is $50000 which can be much bigger/smaller than the NaN values).

推荐答案

您可以尝试了解平均几何级数背后的逻辑

You can try to understand your logic behind the average that is Geometric progression

s=df.isnull().cumsum()
t1=df[(s==1).shift(-1).fillna(False)].stack().reset_index(level=0,drop=True)
t2=df.lookup(s.idxmax()+1,s.idxmax().index)
df.fillna(t1/(2**s)+t2*(1-0.5**s)*2/2)
Out[212]: 
          A         B         C
0 -0.166919  0.979728 -0.632955
1 -0.297953 -0.912674 -1.365463
2 -0.120211 -0.540679 -0.680481
3 -0.454142 -2.027325  1.533582
4 -0.621107 -1.319834  0.461821
5 -0.788073 -0.966089 -1.260201
6 -0.916080 -0.612343 -2.121213
7 -0.887858  1.033826 -2.551718
8  1.948430  1.025011 -2.982224
9  0.019698 -0.795876 -0.046431

说明:

第一个NaN x/2 + y/2 = 1st

1st NaN x/2+y/2=1st

第二个NaN 1st/2 + y/2 = 2nd

2nd NaN 1st/2+y/2=2nd

第3次NaN 2nd/2 + y/2 + 3rd

3rd NaN 2nd/2+y/2+3rd

然后x/(2 ** n)+ y(1-(1/2)** n)/(1-1/2),这是键

Then x/(2**n)+y(1-(1/2)**n)/(1-1/2), this is the key

这篇关于 pandas :用之前和下一个非缺失值的平均值动态替换NaN值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆