NaN上带掩码的数据帧的加权平均值 [英] Weighted average of dataframes with mask on NaN's

查看:97
本文介绍了NaN上带掩码的数据帧的加权平均值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我发现了一些有关平均数据帧的答案,但是没有一个答案包括权重的处理.我想出了一种获得想要的结果的方法(请参阅标题),但我想知道是否有更直接的方法来实现相同的目标.

I have found some answers about averaging dataframes, but none that includes the treatment of weights. I have figured a way to get to the result I want (see title) but I wonder if there is a more direct way of achieving the same goal.

我需要平均多个数据框,但是下面的示例代码仅包含其中两个.

I need to average more than just two dataframes, however the example code below only includes two of them.

import pandas as pd
import numpy as np

df1 = pd.DataFrame([[np.nan, 2, np.nan, 0],
                    [3, 4, np.nan, 1],
                    [np.nan, np.nan, np.nan, 5],
                    [np.nan, 3, np.nan, 4]],
                   columns=list('ABCD'))

df2 = pd.DataFrame([[3, 1, np.nan, 1],
                    [2, 5, np.nan, 3],
                    [np.nan, 4, np.nan, 2],
                    [np.nan, 2, 1, 5]],
                   columns=list('ABCD'))

我要做的是:

  • 将每个数据帧转换为数组(行)的数组,将所有如此转换的数据帧放入数组:
def fromDfToArraysStack(df):

    for i in range(len(df)):
         arrayRow = df.iloc[i].values

         if i == 0:
             arraysStack = arrayRow
         else:
             arraysStack = np.vstack((arraysStack, arrayRow))

    return arraysStack

arraysStack1 = fromDfToArraysStack(df1)
arraysStack2 = fromDfToArraysStack(df2)
arrayOfArrays = np.array([arraysStack1, arraysStack2])

  • 在nan上套上面具并取平均值:
  • masked = np.ma.masked_array(arrayOfArrays,
                                np.isnan(arrayOfArrays))
    arrayAve = np.ma.average(masked,
                             axis = 0,
                             weights = [1,2])
    

    • 在将nans重新放入的同时转换回数据框:
    • pd.DataFrame(np.row_stack(arrayAve.filled(np.nan)))
      
          0           1           2   3
      0   3.000000    1.333333    NaN 0.666667
      1   2.333333    4.666667    NaN 2.333333
      2   NaN         4.000000    NaN 3.000000
      3   NaN         2.333333    1.0 4.666667
      
      

      正如我所说的那样,但希望有一种更简洁的方法来做到这一点,任何人都可以吗?

      As I said this works, but hopefully there is a more concise way to do this, one-liner anybody ?

      推荐答案

      为了使代码整洁,我对导入文件做了一些欺骗,但这是我所能做的最好的事情:

      To make it a tidy one-line, I cheated a little with the imports, but here is the best I could do:

      import pandas as pd
      import numpy as np
      from numpy.ma import average as avg
      from numpy.ma import masked_array as ma
      
      df1 = pd.DataFrame([[np.nan, 2, np.nan, 0],
                          [3, 4, np.nan, 1],
                          [np.nan, np.nan, np.nan, 5],
                          [np.nan, 3, np.nan, 4]],
                         columns=list('ABCD'))
      
      df2 = pd.DataFrame([[3, 1, np.nan, 1],
                          [2, 5, np.nan, 3],
                          [np.nan, 4, np.nan, 2],
                          [np.nan, 2, 1, 5]],
                         columns=list('ABCD'))
      
      df1.combine(df2, lambda x, y: avg([ma(x, np.isnan(x)), ma(y, np.isnan(y))], 0, [1, 2]))
      

      import pandas as pd
      import numpy as np
      from numpy.ma import average as avg
      from numpy.ma import masked_array as ma
      
      df1 = pd.DataFrame([[np.nan, 2, np.nan, 0],
                          [3, 4, np.nan, 1],
                          [np.nan, np.nan, np.nan, 5],
                          [np.nan, 3, np.nan, 4]],
                         columns=list('ABCD'))
      
      df2 = pd.DataFrame([[3, 1, np.nan, 1],
                          [2, 5, np.nan, 3],
                          [np.nan, 4, np.nan, 2],
                          [np.nan, 2, 1, 5]],
                         columns=list('ABCD'))
      
      def df_average(dfs, wgts):
            return pd.DataFrame(avg([ma(df.values, np.isnan(df.values)) for df in dfs], 0, wgts))
      
      
      df_average(dfs=[df1, df2], wgts=[1, 2])
      

      这篇关于NaN上带掩码的数据帧的加权平均值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆