从Rcpp中的bigmemory对象提取具有NA的列 [英] Extracting a column with NA's from a bigmemory object in Rcpp
问题描述
我正在尝试创建一个函数,该函数从Rcpp中的big.matrix对象提取一列(以便可以在将结果带到R中之前在cpp中进行分析),但是我不知道该如何获取它可以识别NA(现在显示为-2147483648-如下面的最小示例所示).如果我可以直接从Rcpp访问功能 GetMatrixCols ( src/bigmemory.cpp ),那会更好,但是我还没有找到一种方法.
I'm trying to create a function that extracts a column from a big.matrix object in Rcpp (so that it can be analyzed in cpp before bringing the results to R), but I can't figure out how to get it to recognise NA's (they are now presented as -2147483648 - as shown in my minimal example below). It would be even better if I could access the function GetMatrixCols (src/bigmemory.cpp) straight from Rcpp, but I've yet to discover a way to do that.
#include <Rcpp.h>
// [[Rcpp::plugins(cpp11)]]
// [[Rcpp::depends(BH, bigmemory)]]
#include <bigmemory/MatrixAccessor.hpp>
#include <bigmemory/isna.hpp>
using namespace Rcpp;
//Logic for extracting column from a Big Matrix object
template <typename T>
NumericVector GetColumn_logic(XPtr<BigMatrix> pMat, MatrixAccessor<T> mat, int cn) {
NumericVector nv(pMat->nrow());
for(int i = 0; i < pMat->nrow(); i++) {
if(isna(mat[cn][i])) {
nv[i] = NA_INTEGER;
} else {
nv[i] = mat[cn][i];
}
}
return nv;
}
//' Extract Column from a Big Matrix.
//'
//' @param pBigMat A bigmemory object address.
//' @param colNum Column Number to extract. Indexing starts from zero.
//' @export
// [[Rcpp::export]]
NumericVector GetColumn(SEXP pBigMat, int colNum) {
XPtr<BigMatrix> xpMat(pBigMat);
switch(xpMat->matrix_type()) {
case 1: return GetColumn_logic(xpMat, MatrixAccessor<char>(*xpMat), colNum);
case 2: return GetColumn_logic(xpMat, MatrixAccessor<short>(*xpMat), colNum);
case 4: return GetColumn_logic(xpMat, MatrixAccessor<int>(*xpMat), colNum);
case 6: return GetColumn_logic(xpMat, MatrixAccessor<float>(*xpMat), colNum);
case 8: return GetColumn_logic(xpMat, MatrixAccessor<double>(*xpMat), colNum);
default: throw Rcpp::exception("Unknown type detected for big.matrix object!");
}
}
/*** R
bm <- bigmemory::as.big.matrix(as.matrix(reshape2::melt(matrix(c(1:4,NA,6:20),4,5))))
bigmemory:::CGetType(bm@address)
bigmemory:::GetCols.bm(bm, 3)
GetColumn(bm@address, 2)
*/
推荐答案
太好了!和我呆一会儿:
That's a great one! Stay with me for a moment:
tl; dr :一旦修复,便可以正常工作:
tl;dr: It works once fixed:
R> sourceCpp("/tmp/bigmemEx.cpp")
R> bm <- bigmemory::as.big.matrix(as.matrix(reshape2::melt(matrix(c(1:4,NA,6:20),4,5))))
R> bigmemory:::CGetType(bm@address)
[1] 4
R> bigmemory:::GetCols.bm(bm, 3)
[1] 1 2 3 4 NA 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
R> GetColumn(bm@address, 2)
[1] 1 2 3 4 NA 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
R>
麻烦始于内部.当您以
matrix(c(1:4,NA,6:20),4,5)
你得到什么?整数!
R> matrix(c(1:4,NA,6:20),4,5)
[,1] [,2] [,3] [,4] [,5]
[1,] 1 NA 9 13 17
[2,] 2 6 10 14 18
[3,] 3 7 11 15 19
[4,] 4 8 12 16 20
R> class(matrix(c(1:4,NA,6:20),4,5))
[1] "matrix"
R> typeof(matrix(c(1:4,NA,6:20),4,5))
[1] "integer"
R>
这本身不是问题,但是一旦您记住IEEE 754标准仅为浮点定义了NaN(如果我错了,就可以了),这是一个问题.
Not a problem per se, but a problem once you remember that the IEEE 754standard has NaN defined for floating point only (correct if I'm wrong).
另一个问题是您在自己中自反地使用了NumericVector
,但对整数进行运算.现在R对于浮点数和整数具有NaN
甚至NA
,但是R之外的普通库"则没有.而大内存设计表示R之外的东西,您就被卡住了.
The other issue is that you reflexively used NumericVector
in your, but operate on integers. Now R has NaN
, and even NA
, for floating point and integer, but 'normal libraries' outside of R do not. And a bigmemory by design represents things outside of R, you're stuck.
修复非常简单:使用IntegerVector
(或等效地在输入时转换您的整数数据).以下是我修改后的代码版本.
The fix is simple enough: use IntegerVector
(or equivalently convert your integer data on input). Below is my altered version of your code.
// -*- mode: C++; c-indent-level: 4; c-basic-offset: 4; indent-tabs-mode: nil; -*-
#include <Rcpp.h>
// [[Rcpp::plugins(cpp11)]]
// [[Rcpp::depends(BH, bigmemory)]]
#include <bigmemory/MatrixAccessor.hpp>
#include <bigmemory/isna.hpp>
using namespace Rcpp;
//Logic for extracting column from a Big Matrix object
template <typename T>
IntegerVector GetColumn_logic(XPtr<BigMatrix> pMat, MatrixAccessor<T> mat, int cn) {
IntegerVector nv(pMat->nrow());
for(int i = 0; i < pMat->nrow(); i++) {
if(isna(mat[cn][i])) {
nv[i] = NA_INTEGER;
} else {
nv[i] = mat[cn][i];
}
}
return nv;
}
//' Extract Column from a Big Matrix.
//'
//' @param pBigMat A bigmemory object address.
//' @param colNum Column Number to extract. Indexing starts from zero.
//' @export
// [[Rcpp::export]]
IntegerVector GetColumn(SEXP pBigMat, int colNum) {
XPtr<BigMatrix> xpMat(pBigMat);
switch(xpMat->matrix_type()) {
case 1: return GetColumn_logic(xpMat, MatrixAccessor<char>(*xpMat), colNum);
case 2: return GetColumn_logic(xpMat, MatrixAccessor<short>(*xpMat), colNum);
case 4: return GetColumn_logic(xpMat, MatrixAccessor<int>(*xpMat), colNum);
case 6: return GetColumn_logic(xpMat, MatrixAccessor<float>(*xpMat), colNum);
case 8: return GetColumn_logic(xpMat, MatrixAccessor<double>(*xpMat), colNum);
default: throw Rcpp::exception("Unknown type detected for big.matrix object!");
}
}
/*** R
bm <- bigmemory::as.big.matrix(as.matrix(reshape2::melt(matrix(c(1:4,NA,6:20),4,5))))
bigmemory:::CGetType(bm@address)
bigmemory:::GetCols.bm(bm, 3)
GetColumn(bm@address, 2)
*/
这篇关于从Rcpp中的bigmemory对象提取具有NA的列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!