rcpp:在移动窗口计算中删除NAs [英] rcpp: removing NAs in a moving window calculation
问题描述
我的想法是在移动窗口(2 x 2)中计算多个统计信息. 例如,下面的代码计算移动窗口中的平均值. 当输入数据没有NA值时,它会很好地工作,但是当NA在数据集中时,会给出不好的结果(将NA视为最低的int). 您能指导我如何改进它吗?例如,在这些计算中排除NA?
My idea is to calculate several statistics in a moving window (2 by 2). For example, the code below calculate the mean value in a moving window. It works well when the input data hasn't got NA values, however gives bad results (NAs are treated as the lowest int) when NAs are in the dataset. Can you guide me how it can be improved - for example by excluding NA in these calculations?
#include <RcppArmadillo.h>
using namespace Rcpp;
// [[Rcpp::depends(RcppArmadillo)]]
// [[Rcpp::export]]
Rcpp::NumericMatrix get_mw_mean(arma::imat x){
int num_r = x.n_rows - 1;
int num_c = x.n_cols - 1;
arma::dmat result(num_r, num_c);
for (int i = 0; i < num_r; i++) {
for (int j = 0; j < num_c; j++) {
arma::imat sub_x = x.submat(i, j, i + 1, j + 1);
arma::ivec sub_x_v = vectorise(sub_x);
arma::vec sub_x_v2 = arma::conv_to<arma::vec>::from(sub_x_v);
double sub_mean = arma::mean(sub_x_v2);
result(i, j) = sub_mean;
}
}
return(wrap(result));
}
/*** R
new_c1 = c(1, 86, 98,
15, 5, 85,
32, 25, 68)
lg1 = matrix(new_c1, nrow = 3, byrow = TRUE)
get_mw_mean(lg1)
new_c2 = c(NA, 86, 98,
15, NA, 85,
32, 25, 68)
lg2 = matrix(new_c2, nrow = 3, byrow = TRUE)
get_mw_mean(lg2)
*/
干杯, 记记
推荐答案
这里发生了两件事:
-
矩阵输入类型
arma::imat
是带符号的int
,但是NA
和NaN
仅出现在float
或double
类型中.本质上,int
在设计上不能具有NA
或NaN
占位符.因此,发生的转换将降为INT_MIN
.
The matrix input type,
arma::imat
, is a signedint
, butNA
andNaN
are only present infloat
ordouble
types. In essence,int
cannot have aNA
orNaN
placeholder by design. Thus, the conversion that occurs is to drop to theINT_MIN
.
需要在 C ++ 中为int
子集出NA
或NaN
值.
The need to subset out NA
or NaN
values in C++ for int
s.
因此,前进的道路是大包检测此INT_MIN
值并将其从矩阵中删除.实现此目的的一种方法是使用 find()
来标识 finite 与INT_MIN
和 .elem()
不匹配的元素来提取确定的元素.
So, the way forward is to be bale to detect this INT_MIN
value and remove it from the matrix. One way to accomplish this is to use find()
to identify finite elements that do not match INT_MIN
and .elem()
to extract the identified elements.
对于涉及double
的案件,例如arma::mat
/arma::vec
/等,请考虑使用 find_finite()
>
实施
For cases involving double
, e.g. arma::mat
/arma::vec
/ et cetera, consider using find_finite()
#include <RcppArmadillo.h>
// [[Rcpp::depends(RcppArmadillo)]]
// [[Rcpp::export]]
arma::mat get_mw_mean_na(arma::imat x){
int num_r = x.n_rows - 1;
int num_c = x.n_cols - 1;
Rcpp::Rcout << x <<std::endl;
arma::dmat result(num_r, num_c);
for (int i = 0; i < num_r; i++) {
for (int j = 0; j < num_c; j++) {
arma::imat sub_x = x.submat(i, j, i + 1, j + 1);
// Conversion + Search for NA values
arma::vec sub_x_v2 = arma::conv_to<arma::vec>::from(
sub_x.elem( find(sub_x != INT_MIN) )
);
result(i, j) = arma::mean(sub_x_v2);
}
}
return result;
}
输出
new_c1 = c(1, 86, 98,
15, 5, 85,
32, 25, 68)
lg1 = matrix(new_c1, nrow = 3, byrow = TRUE)
get_mw_mean_na(lg1)
# [,1] [,2]
# [1,] 26.75 68.50
# [2,] 19.25 45.75
new_c2 = c(NA, 86, 98,
15, NA, 85,
32, 25, 68)
lg2 = matrix(new_c2, nrow = 3, byrow = TRUE)
get_mw_mean_na(lg2)
# [,1] [,2]
# [1,] 50.5 89.66667
# [2,] 24.0 59.33333
这篇关于rcpp:在移动窗口计算中删除NAs的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!