比较 fread 与 read.table 读取 100M 中前 1M 行的速度 [英] Comparing speed of fread vs. read.table for reading the first 1M rows out of 100M
问题描述
我有一个 14GB 的 data.txt 文件.我通过读取前 1M 行来比较 fread
和 read.table
的速度.看起来 fread
慢得多,尽管它不应该如此.百分比计数出现需要一些时间.
I have a 14GB data.txt file. I was comparing the speed of fread
and read.table
by reading the first 1M rows. It looks like fread
is much slower although it is not supposed to be. It takes some time until the percentage counts show up.
可能是什么原因?我认为它应该超级快......我使用的是 Windows 操作系统计算机.
What could be the reason? I thought it was supposed to be super fast... I am using a Windows OS computer.
推荐答案
fread
mmap
s 文件.这需要一些时间,并且会映射整个文件.这意味着后续的读入"会更快.
fread
mmap
s the file. This takes some time, and will map the whole file. This means subsequent "read-ins" will be faster.
read.table
不会mmap
整个文件.它可以逐行读取文件[并在第 1000000 行停止].
read.table
does not mmap
the whole file. It can read in the file line by line [and stop at line 1000000].
您可以在 mmap() 上查看
vs. 读块 mmap
的一些背景知识
You can see some background on mmap
at mmap()
vs. reading blocks
fread
帮助中的示例突出显示了这种行为
The examples in the help from fread
highlight this behaiviour
这篇关于比较 fread 与 read.table 读取 100M 中前 1M 行的速度的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!