rcpp:在移动窗口计算中删除NAs [英] rcpp: removing NAs in a moving window calculation

查看:80
本文介绍了rcpp:在移动窗口计算中删除NAs的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的想法是在移动窗口(2 x 2)中计算多个统计信息. 例如,下面的代码计算移动窗口中的平均值. 当输入数据没有NA值时,它会很好地工作,但是当NA在数据集中时,会​​给出不好的结果(将NA视为最低的int). 您能指导我如何改进它吗?例如,在这些计算中排除NA?

My idea is to calculate several statistics in a moving window (2 by 2). For example, the code below calculate the mean value in a moving window. It works well when the input data hasn't got NA values, however gives bad results (NAs are treated as the lowest int) when NAs are in the dataset. Can you guide me how it can be improved - for example by excluding NA in these calculations?

#include <RcppArmadillo.h>
using namespace Rcpp;

// [[Rcpp::depends(RcppArmadillo)]]

// [[Rcpp::export]]
Rcpp::NumericMatrix get_mw_mean(arma::imat x){
  int num_r = x.n_rows - 1;
  int num_c = x.n_cols - 1;

  arma::dmat result(num_r, num_c);

  for (int i = 0; i < num_r; i++) {
    for (int j = 0; j < num_c; j++) {
      arma::imat sub_x = x.submat(i, j, i + 1, j + 1);
      arma::ivec sub_x_v = vectorise(sub_x);

      arma::vec sub_x_v2 = arma::conv_to<arma::vec>::from(sub_x_v);
      double sub_mean = arma::mean(sub_x_v2);
      result(i, j) = sub_mean;
    }
  }
  return(wrap(result));
}

/*** R
new_c1 = c(1, 86, 98,
           15, 5, 85,
           32, 25, 68)
lg1 = matrix(new_c1, nrow = 3, byrow = TRUE)
get_mw_mean(lg1)
new_c2 = c(NA, 86, 98,
           15, NA, 85,
           32, 25, 68)
lg2 = matrix(new_c2, nrow = 3, byrow = TRUE)
get_mw_mean(lg2)
*/

干杯, 记记

推荐答案

这里发生了两件事:

  1. 矩阵输入类型arma::imat带符号的 int,但是NANaN仅出现在floatdouble类型中.本质上,int在设计上不能具有NANaN占位符.因此,发生的转换将降为INT_MIN.

  1. The matrix input type, arma::imat, is a signed int, but NA and NaN are only present in float or double types. In essence, int cannot have a NA or NaN placeholder by design. Thus, the conversion that occurs is to drop to the INT_MIN.

需要在 C ++ 中为int子集出NANaN值.

The need to subset out NA or NaN values in C++ for ints.

因此,前进的道路是大包检测此INT_MIN值并将其从矩阵中删除.实现此目的的一种方法是使用 find() 来标识 finite INT_MIN .elem() 不匹配的元素来提取确定的元素.

So, the way forward is to be bale to detect this INT_MIN value and remove it from the matrix. One way to accomplish this is to use find() to identify finite elements that do not match INT_MIN and .elem() to extract the identified elements.

对于涉及double的案件,例如arma::mat/arma::vec/等,请考虑使用 find_finite()

实施

For cases involving double, e.g. arma::mat/arma::vec/ et cetera, consider using find_finite()

#include <RcppArmadillo.h>

// [[Rcpp::depends(RcppArmadillo)]]

// [[Rcpp::export]]
arma::mat get_mw_mean_na(arma::imat x){
  int num_r = x.n_rows - 1;
  int num_c = x.n_cols - 1;

  Rcpp::Rcout << x <<std::endl;

  arma::dmat result(num_r, num_c);

  for (int i = 0; i < num_r; i++) {
    for (int j = 0; j < num_c; j++) {
      arma::imat sub_x = x.submat(i, j, i + 1, j + 1);
      // Conversion + Search for NA values
      arma::vec sub_x_v2 = arma::conv_to<arma::vec>::from( 
                                        sub_x.elem( find(sub_x != INT_MIN) ) 
      );

      result(i, j) = arma::mean(sub_x_v2);
    }
  }

  return result;
}

输出

new_c1 = c(1, 86, 98,
           15, 5, 85,
           32, 25, 68)
lg1 = matrix(new_c1, nrow = 3, byrow = TRUE)
get_mw_mean_na(lg1)
#        [,1]  [,2]
# [1,] 26.75 68.50
# [2,] 19.25 45.75

new_c2 = c(NA, 86, 98,
           15, NA, 85,
           32, 25, 68)
lg2 = matrix(new_c2, nrow = 3, byrow = TRUE)
get_mw_mean_na(lg2)
#      [,1]     [,2]
# [1,] 50.5 89.66667
# [2,] 24.0 59.33333

这篇关于rcpp:在移动窗口计算中删除NAs的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆