从Rcpp中的bigmemory对象提取具有NA的列 [英] Extracting a column with NA's from a bigmemory object in Rcpp

查看:104
本文介绍了从Rcpp中的bigmemory对象提取具有NA的列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试创建一个函数,该函数从Rcpp中的big.matrix对象提取一列(以便可以在将结果带到R中之前在cpp中进行分析),但是我不知道该如何获取它可以识别NA(现在显示为-2147483648-如下面的最小示例所示).如果我可以直接从Rcpp访问功能 GetMatrixCols ( src/bigmemory.cpp ),那会更好,但是我还没有找到一种方法.

I'm trying to create a function that extracts a column from a big.matrix object in Rcpp (so that it can be analyzed in cpp before bringing the results to R), but I can't figure out how to get it to recognise NA's (they are now presented as -2147483648 - as shown in my minimal example below). It would be even better if I could access the function GetMatrixCols (src/bigmemory.cpp) straight from Rcpp, but I've yet to discover a way to do that.

#include <Rcpp.h>
// [[Rcpp::plugins(cpp11)]]
// [[Rcpp::depends(BH, bigmemory)]]
#include <bigmemory/MatrixAccessor.hpp>
#include <bigmemory/isna.hpp>
using namespace Rcpp;

//Logic for extracting column from a Big Matrix object
template <typename T>
NumericVector GetColumn_logic(XPtr<BigMatrix> pMat,  MatrixAccessor<T> mat,   int cn) {
  NumericVector nv(pMat->nrow());
  for(int i = 0; i < pMat->nrow(); i++) {
    if(isna(mat[cn][i])) {
      nv[i] = NA_INTEGER;
    } else {
      nv[i] = mat[cn][i];
    }
  }
  return nv;
}

//' Extract Column from a Big Matrix.
//' 
//' @param pBigMat A bigmemory object address.
//' @param colNum Column Number to extract. Indexing starts from zero.
//' @export
// [[Rcpp::export]]
NumericVector GetColumn(SEXP pBigMat, int colNum) {
  XPtr<BigMatrix> xpMat(pBigMat);

  switch(xpMat->matrix_type()) {
    case 1: return GetColumn_logic(xpMat, MatrixAccessor<char>(*xpMat), colNum);
    case 2: return GetColumn_logic(xpMat, MatrixAccessor<short>(*xpMat), colNum);
    case 4: return GetColumn_logic(xpMat, MatrixAccessor<int>(*xpMat), colNum);
    case 6: return GetColumn_logic(xpMat, MatrixAccessor<float>(*xpMat), colNum);
    case 8: return GetColumn_logic(xpMat, MatrixAccessor<double>(*xpMat), colNum);
    default: throw Rcpp::exception("Unknown type detected for big.matrix object!");
  }
}

/*** R
bm <- bigmemory::as.big.matrix(as.matrix(reshape2::melt(matrix(c(1:4,NA,6:20),4,5))))
bigmemory:::CGetType(bm@address)
bigmemory:::GetCols.bm(bm, 3)
GetColumn(bm@address, 2)
*/

推荐答案

太好了!和我呆一会儿:

That's a great one! Stay with me for a moment:

tl; dr :一旦修复,便可以正常工作:

tl;dr: It works once fixed:

R> sourceCpp("/tmp/bigmemEx.cpp")

R> bm <- bigmemory::as.big.matrix(as.matrix(reshape2::melt(matrix(c(1:4,NA,6:20),4,5))))

R> bigmemory:::CGetType(bm@address)
[1] 4

R> bigmemory:::GetCols.bm(bm, 3)
 [1]  1  2  3  4 NA  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20

R> GetColumn(bm@address, 2)
 [1]  1  2  3  4 NA  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20
R> 

麻烦始于内部.当您以

matrix(c(1:4,NA,6:20),4,5)

你得到什么?整数!

R> matrix(c(1:4,NA,6:20),4,5)
     [,1] [,2] [,3] [,4] [,5]
[1,]    1   NA    9   13   17
[2,]    2    6   10   14   18
[3,]    3    7   11   15   19
[4,]    4    8   12   16   20
R> class(matrix(c(1:4,NA,6:20),4,5))
[1] "matrix"
R> typeof(matrix(c(1:4,NA,6:20),4,5))
[1] "integer"
R> 

这本身不是问题,但是一旦您记住IEEE 754标准仅为浮点定义了NaN(如果我错了,就可以了),这是一个问题.

Not a problem per se, but a problem once you remember that the IEEE 754standard has NaN defined for floating point only (correct if I'm wrong).

另一个问题是您在自己中自反地使用了NumericVector,但对整数进行运算.现在R对于浮点数和整数具有NaN甚至NA,但是R之外的普通库"则没有.而大内存设计表示R之外的东西,您就被卡住了.

The other issue is that you reflexively used NumericVector in your, but operate on integers. Now R has NaN, and even NA, for floating point and integer, but 'normal libraries' outside of R do not. And a bigmemory by design represents things outside of R, you're stuck.

修复非常简单:使用IntegerVector(或等效地在输入时转换您的整数数据).以下是我修改后的代码版本.

The fix is simple enough: use IntegerVector (or equivalently convert your integer data on input). Below is my altered version of your code.

// -*- mode: C++; c-indent-level: 4; c-basic-offset: 4; indent-tabs-mode: nil; -*-

#include <Rcpp.h>

// [[Rcpp::plugins(cpp11)]]
// [[Rcpp::depends(BH, bigmemory)]]

#include <bigmemory/MatrixAccessor.hpp>
#include <bigmemory/isna.hpp>

using namespace Rcpp;

//Logic for extracting column from a Big Matrix object
template <typename T>
IntegerVector GetColumn_logic(XPtr<BigMatrix> pMat,  MatrixAccessor<T> mat,   int cn) {
    IntegerVector nv(pMat->nrow());
    for(int i = 0; i < pMat->nrow(); i++) {
        if(isna(mat[cn][i])) {
            nv[i] = NA_INTEGER;
        } else {
            nv[i] = mat[cn][i];
        }
    }
    return nv;
}

//' Extract Column from a Big Matrix.
//' 
//' @param pBigMat A bigmemory object address.
//' @param colNum Column Number to extract. Indexing starts from zero.
//' @export
// [[Rcpp::export]]
IntegerVector GetColumn(SEXP pBigMat, int colNum) {
    XPtr<BigMatrix> xpMat(pBigMat);

    switch(xpMat->matrix_type()) {
    case 1: return GetColumn_logic(xpMat, MatrixAccessor<char>(*xpMat), colNum);
    case 2: return GetColumn_logic(xpMat, MatrixAccessor<short>(*xpMat), colNum);
    case 4: return GetColumn_logic(xpMat, MatrixAccessor<int>(*xpMat), colNum);
    case 6: return GetColumn_logic(xpMat, MatrixAccessor<float>(*xpMat), colNum);
    case 8: return GetColumn_logic(xpMat, MatrixAccessor<double>(*xpMat), colNum);
    default: throw Rcpp::exception("Unknown type detected for big.matrix object!");
    }
}

/*** R
bm <- bigmemory::as.big.matrix(as.matrix(reshape2::melt(matrix(c(1:4,NA,6:20),4,5))))
bigmemory:::CGetType(bm@address)
bigmemory:::GetCols.bm(bm, 3)
GetColumn(bm@address, 2)
*/

这篇关于从Rcpp中的bigmemory对象提取具有NA的列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆