pandas :用之前和下一个非缺失值的平均值动态替换NaN值 [英] Pandas: Dynamically replace NaN values with the average of previous and next non-missing values
问题描述
我有一个具有NaN
值的数据框df
,我想用上一个和下一个非缺失值的平均值动态替换它们.
I have a dataframe df
with NaN
values and I want to dynamically replace them with the average values of previous and next non-missing values.
In [27]: df
Out[27]:
A B C
0 -0.166919 0.979728 -0.632955
1 -0.297953 -0.912674 -1.365463
2 -0.120211 -0.540679 -0.680481
3 NaN -2.027325 1.533582
4 NaN NaN 0.461821
5 -0.788073 NaN NaN
6 -0.916080 -0.612343 NaN
7 -0.887858 1.033826 NaN
8 1.948430 1.025011 -2.982224
9 0.019698 -0.795876 -0.046431
例如,A[3]
是NaN
,因此其值应为(-0.120211-0.788073)/2 = -0.454142. A[4]
然后应为(-0.454142-0.788073)/2 = -0.621108.
For example, A[3]
is NaN
so its value should be (-0.120211-0.788073)/2 = -0.454142. A[4]
then should be (-0.454142-0.788073)/2 = -0.621108.
因此,结果数据框应如下所示:
Therefore, the result dataframe should look like:
In [27]: df
Out[27]:
A B C
0 -0.166919 0.979728 -0.632955
1 -0.297953 -0.912674 -1.365463
2 -0.120211 -0.540679 -0.680481
3 -0.454142 -2.027325 1.533582
4 -0.621108 -1.319834 0.461821
5 -0.788073 -0.966089 -1.260202
6 -0.916080 -0.612343 -2.121213
7 -0.887858 1.033826 -2.551718
8 1.948430 1.025011 -2.982224
9 0.019698 -0.795876 -0.046431
这是处理缺失值的好方法吗?我不能简单地用每列的平均值替换它们因为我的数据是时间序列的,并且会随着时间的流逝而增加. (初始值可能是$ 0,最终值可能是$ 100000,所以平均值是$ 50000,可能比NaN值大/小).
Is this a good way to deal with the missing values? I can't simply replace them by the average values of each column because my data is time-series and tends to increase over time. (The initial value may be $0 and final value might be $100000, so the average is $50000 which can be much bigger/smaller than the NaN values).
推荐答案
您可以尝试了解平均几何级数背后的逻辑
You can try to understand your logic behind the average that is Geometric progression
s=df.isnull().cumsum()
t1=df[(s==1).shift(-1).fillna(False)].stack().reset_index(level=0,drop=True)
t2=df.lookup(s.idxmax()+1,s.idxmax().index)
df.fillna(t1/(2**s)+t2*(1-0.5**s)*2/2)
Out[212]:
A B C
0 -0.166919 0.979728 -0.632955
1 -0.297953 -0.912674 -1.365463
2 -0.120211 -0.540679 -0.680481
3 -0.454142 -2.027325 1.533582
4 -0.621107 -1.319834 0.461821
5 -0.788073 -0.966089 -1.260201
6 -0.916080 -0.612343 -2.121213
7 -0.887858 1.033826 -2.551718
8 1.948430 1.025011 -2.982224
9 0.019698 -0.795876 -0.046431
说明:
第一个NaN x/2 + y/2 = 1st
1st NaN x/2+y/2=1st
第二个NaN 1st/2 + y/2 = 2nd
2nd NaN 1st/2+y/2=2nd
第3次NaN 2nd/2 + y/2 + 3rd
3rd NaN 2nd/2+y/2+3rd
然后x/(2 ** n)+ y(1-(1/2)** n)/(1-1/2),这是键
Then x/(2**n)+y(1-(1/2)**n)/(1-1/2), this is the key
这篇关于 pandas :用之前和下一个非缺失值的平均值动态替换NaN值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!