使用Rcpp运行编译的C ++代码 [英] Running compiled C++ code with Rcpp
问题描述
我一直在通过Dirk Eddelbuettel的 Rcpp
教程:
http://www.rinfinance.com/agenda/
我学习了如何储存一个目录中的C ++文件,并从R中运行它。我运行的C ++文件称为'logabs2.ccp',其内容直接来自Dirk的幻灯片之一:
#include< Rcpp.h>
使用命名空间Rcpp;
inline double f(double x){return :: log(:: fabs(x)); }
// [[Rcpp :: export]]
std :: vector< double> logabs2(std :: vector< double> x){
std :: transform(x.begin(),x.end(),x.begin(),f);
return x;
}
我用这个R代码运行:
library(Rcpp)
sourceCpp(c:/ users / mmiller21 / simple r programs / logabs2.cpp)
logabs2 (-5,5,by = 2))
#[1] 1.609438 1.098612 0.000000 0.000000 1.098612 1.609438
$ b b
我在Windows 7机器上运行代码,从R GUI默认情况下似乎安装。我还安装了最新版本的 Rtools
。上面的R代码似乎需要相对长的时间运行。我怀疑大部分时间是致力于编译的C ++代码,一旦C ++代码编译,它运行非常快。 Microbenchmark
肯定表明 Rcpp
可减少计算时间。
我从来没有使用C ++直到现在,但我知道,当我编译C代码,我得到一个* .exe文件。我从一个名为 logabs2.exe
的文件搜索我的硬盘驱动器,但找不到。我想知道如果创建了一个 logabs2.exe
文件,上面的C ++代码是否可能运行得更快。是否可以创建一个 logabs2.exe
文件并将其存储在某个文件夹中,然后每当我想使用它时,Rcpp调用该文件?我不知道这是否有意义。如果我可以在* .exe文件中存储一个C ++函数,也许我不需要编译该函数每次我想使用Rcpp,然后也许Rcpp代码会更快。
对不起,如果这个问题没有意义或重复。如果可以将C ++函数存储为* .exe文件,我希望有人会告诉我如何修改我的R代码来运行它。感谢您对此的任何帮助,或直接告诉我为什么我不建议或不推荐。
我期待看到Dirk的新书。 >
感谢user1981275,Dirk Eddelbuettel和Romain Francois的回复。下面是我如何编译一个C ++文件并创建一个* .dll,然后调用并在 R
中使用* .dll文件。
步骤1.我创建了一个名为c:\users\mmiller21\myrpackages的新文件夹,并将文件logabs2.cpp粘贴到该新文件夹中。
步骤2.在新文件夹中创建了一个新的 R
使用 R
文件命名为'logabs2'的包我写了一个名为new package creation.r。 'new package creation.r'的内容是:
setwd('c:/ users / mmiller21 / myrpackages /')
库(Rcpp)
Rcpp.package.skeleton(logabs2,example_code = FALSE,cpp_files = c(logabs2.cpp))
我发现上面的语法 Rcpp.package.skeleton
的Hadley Wickham网站: https://github.com/hadley/devtools/wiki/Rcpp
步骤3.我在 R中安装了新的
R
/ code>在DOS命令窗口中使用以下行:
C:\Program Files \R\\ \\ R-3.0.1\bin\x64> R CMD INSTALL -lc:\users\mmiller21\documents\r\win-library\3.0\ c:\users\mmiller21\\ \\ myrpackages\logabs2
其中:
rcmd.exe文件的位置是:
C:\Program Files\R\R-3.0。 1 \bin\x64>
我安装的 R
计算机是:
c:\users\mmiller21\documents\r\win-library\3.0\\ \\
以及我的新 R
程序包安装之前是:
c:\users\mmiller21\myrpackages \
DOS命令窗口中使用的语法通过尝试和错误找到,可能不理想。在某个时候,我在C:\Program Files\R\R-3.0.1\\\x64中粘贴了一个'logabs2.cpp'的副本,但我认为这不重要。
步骤4.安装新的 R
包后,使用 R $ c> $ c>文件我在'c:/ users / mmiller21 / myrpackages /'文件夹中命名为'new package usage.r'(虽然我不认为文件夹很重要)。 'new package usage.r'的内容是:
library(logabs2)
logabs2(seq ,5,by = 2))
输出为:
b $ b
#[1] 1.609438 1.098612 0.000000 0.000000 1.098612 1.609438
$ b b
这个文件加载了 Rcpp
包。
#> R
microbenchmark(logabs2(seq(-5,5,by = 2)),times = 100)
#Unit:microseconds
#expr min lq median uq max neval
#logabs2(seq -5,5,by = 2))43.086 44.453 50.6075 69.756 190.803 100
#> microbenchmark(log(abs(seq(-5,5,by = 2))),times = 100)
#Unit:microseconds
#expr min lq median uq max neval
#log (abs(seq(-5,5,by = 2)))38.298 38.982 39.666 40.35 173.023 100
system.time(
cppFunction(
NumericVector logabs(NumericVector x){
return log(abs(x));
}
)
)
#用户系统已过
#0.06 0.08 5.85
R在这种情况下看起来更快或者与* .dll文件一样快,我毫不怀疑使用带有 Rcpp
的* .dll文件将比基础 R
。
这是我第一次尝试创建R包或使用Rcpp,毫无疑问,我没有使用最有效的方法。
在下面的评论中,我认为Romain Francois建议我将* .cpp文件修改为以下内容:
#include< Rcpp.h>
使用命名空间Rcpp;
// [[Rcpp :: export]]
NumericVector logabs(NumericVector x){
return log(abs(x));
}
并重新创建我的 R
package,我现在做了。然后我使用以下代码将基础 R
与我的新包进行比较:
library(logabs)
logabs(seq(-5,5,by = 2))
log(abs(seq(-5,5,by = 2)))
库(microbenchmark)
microbenchmark(logabs(seq(-5,5,by = 2)),log(abs(seq(-5,5,by = 2) )),times = 100000)
Base R
仍然有点快一点或没有什么不同:
单位:微秒
expr min lq median uq max neval
logabs(seq(-5,5,by = 2))42.401 45.137 46.505 69.073 39754.598 1e + 05
log(abs(seq(-5,5,by = 2)))37.614 40.350 41.718 62.234 3422.133 1e + 05
也许这是因为base R
已经被矢量化。我怀疑有更复杂的函数base R
会慢得多。或者,我还是不使用最有效的方法,或者我只是在某个地方发生错误。
I have been working my way through Dirk Eddelbuettel's Rcpp
tutorial here:
http://www.rinfinance.com/agenda/
I have learned how to save a C++ file in a directory and call it and run it from within R. The C++ file I am running is called 'logabs2.ccp' and its contents are directly from one of Dirk's slides:
#include <Rcpp.h>
using namespace Rcpp;
inline double f(double x) { return ::log(::fabs(x)); }
// [[Rcpp::export]]
std::vector<double> logabs2(std::vector<double> x) {
std::transform(x.begin(), x.end(), x.begin(), f);
return x;
}
I run it with this R code:
library(Rcpp)
sourceCpp("c:/users/mmiller21/simple r programs/logabs2.cpp")
logabs2(seq(-5, 5, by=2))
# [1] 1.609438 1.098612 0.000000 0.000000 1.098612 1.609438
I am running the code on a Windows 7 machine from within the R GUI that seems to install by default. I also installed the most recent version of Rtools
. The above R code seems to take a relatively long time to run. I suspect most of that time is devoted to compiling the C++ code and that once the C++ code is compiled it runs very quickly. Microbenchmark
certainly suggests that Rcpp
reduces computation time.
I have never used C++ until now, but I know that when I compile C code I get an *.exe file. I have searched my hard-drive from a file called logabs2.exe
but cannot find one. I am wondering whether the above C++ code might run even faster if a logabs2.exe
file was created. Is it possible to create a logabs2.exe
file and store it in a folder somewhere and then have Rcpp call that file whenever I wanted to use it? I do not know whether that makes sense. If I could store a C++ function in an *.exe file then perhaps I would not have to compile the function every time I wanted to use it with Rcpp and then perhaps the Rcpp code would be even faster.
Sorry if this question does not make sense or is a duplicate. If it is possible to store the C++ function as an *.exe file I am hoping someone will show me how to modify my R code above to run it. Thank you for any help with this or for setting me straight on why what I suggest is not possible or recommended.
I look forward to seeing Dirk's new book.
Thank you to user1981275, Dirk Eddelbuettel and Romain Francois for their responses. Below is how I compiled a C++ file and created a *.dll, then called and used that *.dll file inside R
.
Step 1. I created a new folder called 'c:\users\mmiller21\myrpackages' and pasted the file 'logabs2.cpp' into that new folder. The file 'logabs2.cpp' was created as described in my original post.
Step 2. Inside the new folder I created a new R
package called 'logabs2' using an R
file I wrote called 'new package creation.r'. The contents of 'new package creation.r' are:
setwd('c:/users/mmiller21/myrpackages/')
library(Rcpp)
Rcpp.package.skeleton("logabs2", example_code = FALSE, cpp_files = c("logabs2.cpp"))
I found the above syntax for Rcpp.package.skeleton
on one of Hadley Wickham's websites: https://github.com/hadley/devtools/wiki/Rcpp
Step 3. I installed the new R
package "logabs2" in R
using the following line in the DOS command window:
C:\Program Files\R\R-3.0.1\bin\x64>R CMD INSTALL -l c:\users\mmiller21\documents\r\win-library\3.0\ c:\users\mmiller21\myrpackages\logabs2
where:
the location of the rcmd.exe file is:
C:\Program Files\R\R-3.0.1\bin\x64>
the location of installed R
packages on my computer is:
c:\users\mmiller21\documents\r\win-library\3.0\
and the location of my new R
package prior to being installed is:
c:\users\mmiller21\myrpackages\
Syntax used in the DOS command window was found by trial and error and may not be ideal. At some point I pasted a copy of 'logabs2.cpp' in 'C:\Program Files\R\R-3.0.1\bin\x64>' but I do not think that mattered.
Step 4. After installing the new R
package I ran it using an R
file I named 'new package usage.r' in the 'c:/users/mmiller21/myrpackages/' folder (although I do not think the folder was important). The contents of 'new package usage.r' are:
library(logabs2)
logabs2(seq(-5, 5, by=2))
The output was:
# [1] 1.609438 1.098612 0.000000 0.000000 1.098612 1.609438
This file loaded the package Rcpp
without me asking.
In this case base R
was faster assuming I did this correctly.
#> microbenchmark(logabs2(seq(-5, 5, by=2)), times = 100)
#Unit: microseconds
# expr min lq median uq max neval
# logabs2(seq(-5, 5, by = 2)) 43.086 44.453 50.6075 69.756 190.803 100
#> microbenchmark(log(abs(seq(-5, 5, by=2))), times=100)
#Unit: microseconds
# expr min lq median uq max neval
# log(abs(seq(-5, 5, by = 2))) 38.298 38.982 39.666 40.35 173.023 100
However, using the dll file was faster than calling the external cpp file:
system.time(
cppFunction("
NumericVector logabs(NumericVector x) {
return log(abs(x));
}
")
)
# user system elapsed
# 0.06 0.08 5.85
Although base R seems faster or as fast as the *.dll file in this case, I have no doubt that using the *.dll file with Rcpp
will be faster than base R
in most cases.
This was my first attempt creating an R package or using Rcpp and no doubt I did not use the most efficient methods. Also, I apologize for any typographic errors in this post.
EDIT
In a comment below I think Romain Francois suggested I modify the *.cpp file to the following:
#include <Rcpp.h>
using namespace Rcpp;
// [[Rcpp::export]]
NumericVector logabs(NumericVector x) {
return log(abs(x));
}
and recreate my R
package, which I have now done. I then compared base R
against my new package using the following code:
library(logabs)
logabs(seq(-5, 5, by=2))
log(abs(seq(-5, 5, by=2)))
library(microbenchmark)
microbenchmark(logabs(seq(-5, 5, by=2)), log(abs(seq(-5, 5, by=2))), times = 100000)
Base R
is still a tiny bit faster or no different:
Unit: microseconds
expr min lq median uq max neval
logabs(seq(-5, 5, by = 2)) 42.401 45.137 46.505 69.073 39754.598 1e+05
log(abs(seq(-5, 5, by = 2))) 37.614 40.350 41.718 62.234 3422.133 1e+05
Perhaps this is because base R
is already vectorized. I suspect with more complex functions base R
will be much slower. Or perhaps I am still not using the most efficient approach, or perhaps I simply made an error somewhere.
这篇关于使用Rcpp运行编译的C ++代码的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!