协方差矩阵Python-省略-9999值 [英] Covariance Matrix Python - Omit -9999 Value

查看:180
本文介绍了协方差矩阵Python-省略-9999值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用python计算两个完全重叠的图像的协方差矩阵.相同的代码是:

I'm trying to calculate the co-variance matrix of two completely overlapping images using python. The code for the same is:

stacked = np.vstack((image1.ravel(),image2.ravel()))
np.cov(stacked)

  • 使用此方法的问题是,有时图像可能包含 NoData 值,例如-9999,表示不存在像素值.在这种情况下,np.cov仍会考虑导致图像均值急剧变化的值,从而给出错误的协方差输出.

    • The issue with using this method is that sometimes the images may contain a NoData value like -9999 signifying that the pixel value isn't present. In such a case the np.cov still considers the value causing the mean of the images to drastically vary giving the wrong covariance output.

      如果我尝试删除NoData,则会出现维数问题,其中两个图像的维数都不相同,因此无法计算协方差矩阵.

      If I try to remove the NoData there comes the issue of dimensionality wherein both the images don't have the same dimensions and hence the covariance matrix cannot be computed.

      手动计算将非常耗时

      是否有一个值可以克服NoData问题并正确计算协方差矩阵?

      Is there a value to overcome the issue of NoData and calculate the covariance matrix correctly?

      推荐答案

      您最好的选择是使用numpy的掩码数组提供的方法,其中之一是在存在掩码项的情况下计算协方差矩阵:

      Your best option would be to use the methods provided with numpy's masked arrays, one of which is that of computing the covariance matrix when masked items are present:

      >>> import numpy as np
      >>> mask_value = -9999
      >>> a = np.array([1, 2, mask_value, 4])
      >>> b = np.array([1, mask_value, 3, 4])
      >>> c = np.vstack((a,b))
      >>> 
      >>> masked_a, masked_b, masked_c = [np.ma.array(x, mask=x==mask_value) for x in (a,b,c)]  # note: testing for equality is a bad idea if you're working with floats. I'm not, these are integers, so it's okay.
      >>> 
      >>> result = np.ma.cov(masked_c)
      >>> result
      masked_array(data =
       [[2.333333333333333 4.444444444444445]
       [4.444444444444445 2.333333333333333]],
                   mask =
       [[False False]
       [False False]],
             fill_value = 1e+20)
      
      >>> np.cov([1,2,4]) # autocovariance when just one element is masked is the same as the previous result[0,0]
      array(2.333333333333333)
      

      根据您如何调用np.ma.cov,结果会有所不同:

      The results are different depending on how you call np.ma.cov:

      >>> np.ma.cov(masked_a, masked_b)
      masked_array(data =
       [[4.5 4.5]
       [4.5 4.5]],
                   mask =
       [[False False]
       [False False]],
             fill_value = 1e+20)
      
      >>> np.cov([1,4])  # result of the autocovariance when 2 of the 4 values are masked
      array(4.5)
      

      之所以这样,是因为后一种方法结合了两个变量的掩码,如下所示:

      The reason for that is that the latter approach combines the masks for the 2 variables like this:

      >>> mask2 = masked_c.mask.any(axis=0)
      >>> all_masked_c = np.ma.array(c, mask=np.vstack((mask2, mask2)))
      >>> all_masked_c
      masked_array(data =
       [[1 -- -- 4]
       [1 -- -- 4]],
                   mask =
       [[False  True  True False]
       [False  True  True False]],
             fill_value = 999999)
      
      >>> np.ma.cov(all_masked_c) # same function call as the first approach, but with a different mask!
      masked_array(data =
       [[4.5 4.5]
       [4.5 4.5]],
                   mask =
       [[False False]
       [False False]],
             fill_value = 1e+20)
      

      因此,请使用np.ma.cov,但请注意当存在不重叠的掩码值时,如何解释数据.

      So use np.ma.cov but take note of how you want the data to be interpreted when there are non-overlapping masked values present.

      这篇关于协方差矩阵Python-省略-9999值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆