R读取数据表速度不一致 [英] R fread data.table inconsistent speed

查看:43
本文介绍了R读取数据表速度不一致的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我观察到fread函数的data.table的速度不一致.我需要2个文件,大小约为8 GB.文件的内容(几乎)相同.读取两个文件的时间奇怪地不同.

I am observing an inconsistent speed of data.table of fread function. I have to 2 files of ~8 GB size. The content of the files are (almost) same. Time to read two files are strangely different.

 control.major  <-  fread("control.major.gff")$V6
 Read 19.8% of 98100000 rows
 Read 98100000 rows and 10 (of 10) columns from 7.947 GB file in 02:06:58
 control.minor  <-  fread("control.minor.gff")$V6  
 Read 98100000 rows and 10 (of 10) columns from 7.947 GB file in 00:03:15

我必须阅读所有数字文件的第6列.最初,我发现fread比

I have to read 6th column of the files which are all numeric. Initially I found that fread was faster compared to

 scan(pipe("cut -f6  SNP.major.gff"),  sep="\n")

因为剪切功能花费了很多时间.

Because cut function was taking awful lot of time.

为什么fread的行为不一致?有没有更快的方法来读取一列?

Why there is inconsistent behavior of fread? Is there a faster way to read one column?

推荐答案

我遇到了类似的问题.即,我第一次跑步时速度很慢,但是连续跑步速度要快得多.就我而言,这是由于我在大学计算机实验室中的计算机上工作.因此,数据不在我的计算机上本地,而是在网络上.这意味着运行fread的大部分时间实际上是通过跨网络传输数据并将其传输到我的本地工作内存中来表示的.当我在第一次运行时为代码计时时,即 user time + sys.时间<<已用时间.

I've had a similar problem. Namely, the first time I ran fread it was very slow, however, successive runs were much faster. In my case this was due to the fact that I was working on a computer in my University's computer lab. Consequently, the data was not locally on my machine, but was on a network. This meant that most of the time spent running fread was actually represented by transferring the data across the network and into my local working memory. This was corroborated by the fact that when I timed my code on the first run, the user time + sys. time << elapsed time.

但是,当您一次加载数据时,它会临时存储在您的工作内存(即RAM)中.因此,连续调用具有相同数据的fread会更快.

When you load the data once, however, it is temporarily in your working memory, i.e. RAM. Successive calls to fread with the same data are therefore much faster.

这篇关于R读取数据表速度不一致的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆