比使用Rcpp的scan()更快? [英] faster than scan() with Rcpp?
问题描述
在我的机器上,即使使用scan(..., what="numeric", nmax=5000)
或类似技巧,从文本文件中将〜5x10 ^ 6个数值读入R的过程也相对较慢(几秒钟,并且我读取了多个此类文件).为这种任务尝试使用Rcpp
包装器是否值得(例如Armadillo
具有一些用于读取文本文件的实用程序)?
还是由于预期的接口开销,我可能会浪费时间使性能几乎没有增加?我不确定当前限制速度的是什么(内部机器性能还是其他?),这是我通常每天重复多次的任务,并且文件格式始终相同,即1000列,大约5000行.>
如果需要,这里是一个示例文件.
nr <- 5000
nc <- 1000
m <- matrix(round(rnorm(nr*nc),3),nr=nr)
cat(m[1, -1], "\n", file = "test.txt") # first line is shorter
write.table(m[-1, ], file = "test.txt", append=TRUE,
row.names = FALSE, col.names = FALSE)
更新:我尝试使用Armadillo进行read.csv.sql
以及load("test.txt", arma::raw_ascii)
,但都比scan
解决方案慢.
我强烈建议您在最新版本的data.table
中签出fread
. CRAN(1.8.6)上的版本还没有fread
(在撰写本文时),因此,如果从R-forge的最新源安装,您应该可以得到它.请参见此处.
Reading ~5x10^6 numeric values into R from a text file is relatively slow on my machine (a few seconds, and I read several such files), even with scan(..., what="numeric", nmax=5000)
or similar tricks. Could it be worthwhile to try an Rcpp
wrapper for this sort of task (e.g. Armadillo
has a few utilities to read text files)?
Or would I likely be wasting my time for little to no gain in performance because of an expected interface overhead? I'm not sure what's currently limiting the speed (intrinsic machine performance, or else?) It's a task that I repeat many times a day, typically, and the file format is always the same, 1000 columns, around 5000 rows.
Here's a sample file to play with, if needed.
nr <- 5000
nc <- 1000
m <- matrix(round(rnorm(nr*nc),3),nr=nr)
cat(m[1, -1], "\n", file = "test.txt") # first line is shorter
write.table(m[-1, ], file = "test.txt", append=TRUE,
row.names = FALSE, col.names = FALSE)
Update: I tried read.csv.sql
and also load("test.txt", arma::raw_ascii)
using Armadillo and both were slower than the scan
solution.
I highly recommend checking out fread
in the latest version of data.table
. The version on CRAN (1.8.6) doesn't have fread
yet (at the time of this post) so you should be able to get it if you install from the latest source at R-forge. See here.
这篇关于比使用Rcpp的scan()更快?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!