NaN上带掩码的数据帧的加权平均值 [英] Weighted average of dataframes with mask on NaN's
问题描述
我发现了一些有关平均数据帧的答案,但是没有一个答案包括权重的处理.我想出了一种获得想要的结果的方法(请参阅标题),但我想知道是否有更直接的方法来实现相同的目标.
I have found some answers about averaging dataframes, but none that includes the treatment of weights. I have figured a way to get to the result I want (see title) but I wonder if there is a more direct way of achieving the same goal.
我需要平均多个数据框,但是下面的示例代码仅包含其中两个.
I need to average more than just two dataframes, however the example code below only includes two of them.
import pandas as pd
import numpy as np
df1 = pd.DataFrame([[np.nan, 2, np.nan, 0],
[3, 4, np.nan, 1],
[np.nan, np.nan, np.nan, 5],
[np.nan, 3, np.nan, 4]],
columns=list('ABCD'))
df2 = pd.DataFrame([[3, 1, np.nan, 1],
[2, 5, np.nan, 3],
[np.nan, 4, np.nan, 2],
[np.nan, 2, 1, 5]],
columns=list('ABCD'))
我要做的是:
- 将每个数据帧转换为数组(行)的数组,将所有如此转换的数据帧放入数组:
def fromDfToArraysStack(df):
for i in range(len(df)):
arrayRow = df.iloc[i].values
if i == 0:
arraysStack = arrayRow
else:
arraysStack = np.vstack((arraysStack, arrayRow))
return arraysStack
arraysStack1 = fromDfToArraysStack(df1)
arraysStack2 = fromDfToArraysStack(df2)
arrayOfArrays = np.array([arraysStack1, arraysStack2])
- 在nan上套上面具并取平均值:
masked = np.ma.masked_array(arrayOfArrays,
np.isnan(arrayOfArrays))
arrayAve = np.ma.average(masked,
axis = 0,
weights = [1,2])
- 在将nans重新放入的同时转换回数据框:
pd.DataFrame(np.row_stack(arrayAve.filled(np.nan)))
0 1 2 3
0 3.000000 1.333333 NaN 0.666667
1 2.333333 4.666667 NaN 2.333333
2 NaN 4.000000 NaN 3.000000
3 NaN 2.333333 1.0 4.666667
正如我所说的那样,但希望有一种更简洁的方法来做到这一点,任何人都可以吗?
As I said this works, but hopefully there is a more concise way to do this, one-liner anybody ?
推荐答案
为了使代码整洁,我对导入文件做了一些欺骗,但这是我所能做的最好的事情:
To make it a tidy one-line, I cheated a little with the imports, but here is the best I could do:
import pandas as pd
import numpy as np
from numpy.ma import average as avg
from numpy.ma import masked_array as ma
df1 = pd.DataFrame([[np.nan, 2, np.nan, 0],
[3, 4, np.nan, 1],
[np.nan, np.nan, np.nan, 5],
[np.nan, 3, np.nan, 4]],
columns=list('ABCD'))
df2 = pd.DataFrame([[3, 1, np.nan, 1],
[2, 5, np.nan, 3],
[np.nan, 4, np.nan, 2],
[np.nan, 2, 1, 5]],
columns=list('ABCD'))
df1.combine(df2, lambda x, y: avg([ma(x, np.isnan(x)), ma(y, np.isnan(y))], 0, [1, 2]))
import pandas as pd
import numpy as np
from numpy.ma import average as avg
from numpy.ma import masked_array as ma
df1 = pd.DataFrame([[np.nan, 2, np.nan, 0],
[3, 4, np.nan, 1],
[np.nan, np.nan, np.nan, 5],
[np.nan, 3, np.nan, 4]],
columns=list('ABCD'))
df2 = pd.DataFrame([[3, 1, np.nan, 1],
[2, 5, np.nan, 3],
[np.nan, 4, np.nan, 2],
[np.nan, 2, 1, 5]],
columns=list('ABCD'))
def df_average(dfs, wgts):
return pd.DataFrame(avg([ma(df.values, np.isnan(df.values)) for df in dfs], 0, wgts))
df_average(dfs=[df1, df2], wgts=[1, 2])
这篇关于NaN上带掩码的数据帧的加权平均值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!