RStudio 因具有可重现代码的 RCpp 崩溃 [英] RStudio crashes with RCpp with reproducible codes

查看:49
本文介绍了RStudio 因具有可重现代码的 RCpp 崩溃的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

@user3759195 写了一篇文章 https://stackoverflow.com/questions/24322356/rstudio-crashes-and-it-does-not-reproduce 关于 RStudio 与 RCpp 崩溃,但没有给出任何可重现的案例.@KevinUshey 在评论中提到我们必须在代码中PROTECT wrap.

@user3759195 wrote a post https://stackoverflow.com/questions/24322356/rstudio-crashes-and-it-does-not-reproduce about RStudio crashing with RCpp, but didn't give any reproducible case. @KevinUshey mentioned in the comments that we have to PROTECT the wrap within the code.

我冒昧地发布了两个用 RCpp 编写的 split.data.frame 函数的替代方案:

I took the liberty of posting two alternatives to split.data.frame function written in RCpp:

* 不会使 RSTUDIO 崩溃的版本 *

//[[Rcpp::export]]
List splitDataFrameCpp(DataFrame x,NumericVector y) {
  int nRows=x.nrows();
  int nCols=x.size();

  std::map<double,vector<double> > z;
  for (int i=0;i<nCols;i++) {
    std::vector<double> tmp=Rcpp::as<std::vector<double> > (x[i]);
    for (int j=0;j<nRows;j++) {
      z[y[j]].push_back(tmp[j]);      
    }
  }

  std::vector<double> yunq=Rcpp::as<std::vector<double> > (sort_unique(y));
  std::map<double, DataFrame> z1;
  for (int i=0;i<int(yunq.size());i++) {
    NumericVector tmp1=wrap(z[yunq[i]]);   // *** DEFINING INSIDE LOOP ***
    tmp1.attr("dim")=Dimension(int(tmp1.size())/nCols,nCols);
    DataFrame tmp2(wrap(tmp1));   // *** DEFINING INSIDE LOOP ***
    tmp2.attr("names")=x.attr("names");
    z1[yunq[i]]=tmp2;
  }
  return wrap(z1);  
}

* 导致 RSTUDIO 崩溃的版本 *

//[[Rcpp::export]]
List splitDataFrameCpp(DataFrame x,NumericVector y) {
  int nRows=x.nrows();
  int nCols=x.size();

  std::map<double,vector<double> > z;
  for (int i=0;i<nCols;i++) {
    std::vector<double> tmp=Rcpp::as<std::vector<double> > (x[i]);
    for (int j=0;j<nRows;j++) {
      z[y[j]].push_back(tmp[j]);      
    }
  }

  std::vector<double> yunq=Rcpp::as<std::vector<double> > (sort_unique(y));
  std::map<double, DataFrame> z1;

  NumericVector tmp1;    // *** DEFINING OUTSIDE LOOP ***
  DataFrame tmp2;    // *** DEFINING OUTSIDE LOOP ***

  for (int i=0;i<int(yunq.size());i++) {
    tmp1=wrap(z[yunq[i]]);
    tmp1.attr("dim")=Dimension(int(tmp1.size())/nCols,nCols);
    tmp2=wrap(tmp1);
    tmp2.attr("names")=x.attr("names");
    z1[yunq[i]]=tmp2;
  }    
  return wrap(z1);      
}

这两种代码的主要区别在于,在一种情况下tmp1tmp2 定义在循环内,另一种情况下定义在循环外.

The main difference between the two codes is that in one case tmp1 and tmp2 is defined within the loop, and in the other case outside the loop.

  1. 谁能解释为什么第二个循环崩溃(以及可以更改什么以防止 RStudio 崩溃)?我仍然是 C++ 的新手,主要是通过查看 SO 或 RCpp 画廊网站上的示例来编写 RCpp - 所以想更多地了解这种行为.

  1. Can anyone explain why the second loop crashes (and what can be changed to NOT make RStudio crash)? I'm still a newbie to C++ and primarily writing RCpp by looking at examples on SO or the RCpp gallery website - so would like to understand this behavior a little more.

此外,作为附带好处,如果有人可以推荐更改以使代码更快,那就太好了.根据我使用的一些测试用例,不会崩溃的代码目前比 R 的 split.data.frame 函数快 2-3 倍.

Also, as a side benefit, if anyone can recommend changes to make the code faster, that will be great. The code that does NOT crash is currently around 2x-3x times faster than R's split.data.frame function based on some test cases I used.

测试用例示例:

> testDF
   V1 V2 V3 V4 V5 V6
1   1  5  4  1  3  2
2   2  1  5  4  1  3
3   2  2  1  5  4  1
4   3  2  2  1  5  4
5   1  3  2  2  1  5
6   4  1  3  2  2  1
7   1  5  4  1  3  2
8   2  1  5  4  1  3
9   2  2  1  5  4  1
10  3  2  2  1  5  4
11  1  3  2  2  1  5
12  4  1  3  2  2  1

> testSp<-c(1,1,1,2,2,2,3,4,4,3,3,5)

> split(testDF,testSp)     OR  > splitDataFrameCpp(testDF,testSp)     
$`1`
  V1 V2 V3 V4 V5 V6
1  1  5  4  1  3  2
2  2  1  5  4  1  3
3  2  2  1  5  4  1

$`2`
  V1 V2 V3 V4 V5 V6
4  3  2  2  1  5  4
5  1  3  2  2  1  5
6  4  1  3  2  2  1

$`3`
   V1 V2 V3 V4 V5 V6
7   1  5  4  1  3  2
10  3  2  2  1  5  4
11  1  3  2  2  1  5

$`4`
  V1 V2 V3 V4 V5 V6
8  2  1  5  4  1  3
9  2  2  1  5  4  1

$`5`
   V1 V2 V3 V4 V5 V6
12  4  1  3  2  2  1

这个测试用例的微基准结果:

> microbenchmark(t1<-split(testDF,testSp),t2<-splitDataFrameCpp(testDF,testSp))
Unit: microseconds
                                   expr     min      lq   median       uq      max neval
             t1 <- split(testDF, test2) 343.181 365.562 372.8760 387.9430 1027.786   100
 t2 <- splitDataFrameCpp(testDF, test2) 177.881 190.315 200.5545 208.4545  870.093   100

* 编辑 *

添加了sessionInfo:

> sessionInfo()
R version 3.1.0 (2014-04-10)
Platform: x86_64-w64-mingw32/x64 (64-bit)

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252   
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C                          
[5] LC_TIME=English_United States.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] microbenchmark_1.3-0

loaded via a namespace (and not attached):
[1] Rcpp_0.11.1 tools_3.1.0

此外,testDF 在 R 中创建为 numeric,而不是 integer.

Also, testDF was created as a numeric in R, not integer.

推荐答案

就其价值而言,这里是一个完整的示例,您只需 sourceCpp().与 Kevin 和 Romain 指出的类似,它对我来说也没有爆炸.

For what it is worth, here is a complete example you can just sourceCpp(). And similar to what Kevin and Romain noted, it does not blow up for me either.

#include <Rcpp.h>

using namespace Rcpp;
using namespace std;

//[[Rcpp::export]]
List splitDataFrameCppA(DataFrame x,NumericVector y) {
  int nRows=x.nrows();
  int nCols=x.size();

  std::map<double,vector<double> > z;
  for (int i=0;i<nCols;i++) {
    std::vector<double> tmp=Rcpp::as<std::vector<double> > (x[i]);
    for (int j=0;j<nRows;j++) {
      z[y[j]].push_back(tmp[j]);      
    }
  }

  std::vector<double> yunq=Rcpp::as<std::vector<double> > (sort_unique(y));
  std::map<double, DataFrame> z1;
  for (int i=0;i<int(yunq.size());i++) {
    NumericVector tmp1=wrap(z[yunq[i]]);   // *** DEFINING INSIDE LOOP ***
    tmp1.attr("dim")=Dimension(int(tmp1.size())/nCols,nCols);
    DataFrame tmp2(wrap(tmp1));   // *** DEFINING INSIDE LOOP ***
    tmp2.attr("names")=x.attr("names");
    z1[yunq[i]]=tmp2;
  }
  return wrap(z1);  
}


//[[Rcpp::export]]
List splitDataFrameCppB(DataFrame x,NumericVector y) {
  int nRows=x.nrows();
  int nCols=x.size();

  std::map<double,vector<double> > z;
  for (int i=0;i<nCols;i++) {
    std::vector<double> tmp=Rcpp::as<std::vector<double> > (x[i]);
    for (int j=0;j<nRows;j++) {
      z[y[j]].push_back(tmp[j]);      
    }
  }

  std::vector<double> yunq=Rcpp::as<std::vector<double> > (sort_unique(y));
  std::map<double, DataFrame> z1;

  NumericVector tmp1;    // *** DEFINING OUTSIDE LOOP ***
  DataFrame tmp2;    // *** DEFINING OUTSIDE LOOP ***

  for (int i=0;i<int(yunq.size());i++) {
    tmp1=wrap(z[yunq[i]]);
    tmp1.attr("dim")=Dimension(int(tmp1.size())/nCols,nCols);
    tmp2=wrap(tmp1);
    tmp2.attr("names")=x.attr("names");
    z1[yunq[i]]=tmp2;
  }    
  return wrap(z1);      
}


/*** R

testDF <- read.table(textConnection("
1  5  4  1  3  2
2  1  5  4  1  3
2  2  1  5  4  1
3  2  2  1  5  4
1  3  2  2  1  5
4  1  3  2  2  1
1  5  4  1  3  2
2  1  5  4  1  3
2  2  1  5  4  1
3  2  2  1  5  4
1  3  2  2  1  5
4  1  3  2  2  1
"))

testSp <- c(1,1,1,2,2,2,3,4,4,3,3,5)


str(splitDataFrameCppA(testDF, testSp))
str(splitDataFrameCppB(testDF, testSp))

library(microbenchmark)
microbenchmark(split(testDF,testSp),
               splitDataFrameCppA(testDF,testSp),
               splitDataFrameCppB(testDF,testSp))

*/

基准在您的两个版本之间大致相等:

The benchmark is about even between your two version:

R> library(microbenchmark)

R> microbenchmark(split(testDF,testSp),
+                splitDataFrameCppA(testDF,testSp),
+                splitDataFrameCppB(testDF,testSp))
Unit: microseconds
                               expr     min      lq  median      uq      max neval
              split(testDF, testSp) 687.271 724.748 745.287 791.574 2373.283   100
 splitDataFrameCppA(testDF, testSp) 380.781 393.161 406.686 421.469  491.803   100
 splitDataFrameCppB(testDF, testSp) 377.959 393.391 405.476 429.947 2052.193   100
R> 
R> 

这篇关于RStudio 因具有可重现代码的 RCpp 崩溃的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆