比使用Rcpp的scan()更快? [英] faster than scan() with Rcpp?

查看:92
本文介绍了比使用Rcpp的scan()更快?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在我的机器上,即使使用scan(..., what="numeric", nmax=5000)或类似技巧,从文本文件中将〜5x10 ^ 6个数值读入R的过程也相对较慢(几秒钟,并且我读取了多个此类文件).为这种任务尝试使用Rcpp包装器是否值得(例如Armadillo具有一些用于读取文本文件的实用程序)? 还是由于预期的接口开销,我可能会浪费时间使性能几乎没有增加?我不确定当前限制速度的是什么(内部机器性能还是其他?),这是我通常每天重复多次的任务,并且文件格式始终相同,即1000列,大约5000行.

如果需要,这里是一个示例文件.

nr <- 5000
nc <- 1000

m <- matrix(round(rnorm(nr*nc),3),nr=nr)

cat(m[1, -1], "\n", file = "test.txt") # first line is shorter
write.table(m[-1, ], file = "test.txt", append=TRUE,
            row.names = FALSE, col.names = FALSE)

更新:我尝试使用Armadillo进行read.csv.sql以及load("test.txt", arma::raw_ascii),但都比scan解决方案慢.

解决方案

我强烈建议您在最新版本的data.table中签出fread. CRAN(1.8.6)上的版本还没有fread(在撰写本文时),因此,如果从R-forge的最新源安装,您应该可以得到它.请参见此处.

Reading ~5x10^6 numeric values into R from a text file is relatively slow on my machine (a few seconds, and I read several such files), even with scan(..., what="numeric", nmax=5000) or similar tricks. Could it be worthwhile to try an Rcpp wrapper for this sort of task (e.g. Armadillo has a few utilities to read text files)? Or would I likely be wasting my time for little to no gain in performance because of an expected interface overhead? I'm not sure what's currently limiting the speed (intrinsic machine performance, or else?) It's a task that I repeat many times a day, typically, and the file format is always the same, 1000 columns, around 5000 rows.

Here's a sample file to play with, if needed.

nr <- 5000
nc <- 1000

m <- matrix(round(rnorm(nr*nc),3),nr=nr)

cat(m[1, -1], "\n", file = "test.txt") # first line is shorter
write.table(m[-1, ], file = "test.txt", append=TRUE,
            row.names = FALSE, col.names = FALSE)

Update: I tried read.csv.sql and also load("test.txt", arma::raw_ascii) using Armadillo and both were slower than the scan solution.

解决方案

I highly recommend checking out fread in the latest version of data.table. The version on CRAN (1.8.6) doesn't have fread yet (at the time of this post) so you should be able to get it if you install from the latest source at R-forge. See here.

这篇关于比使用Rcpp的scan()更快?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆