使用 data.table (with fread) 快速读取和组合多个文件 [英] Fast reading and combining several files using data.table (with fread)

查看:32
本文介绍了使用 data.table (with fread) 快速读取和组合多个文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有几个结构相同的不同 txt 文件.现在我想使用 fread 将它们读入 R,然后将它们合并到一个更大的数据集中.

I have several different txt files with the same structure. Now I want to read them into R using fread, and then union them into a bigger dataset.

## First put all file names into a list 
library(data.table)
all.files <- list.files(path = "C:/Users",pattern = ".txt")

## Read data using fread
readdata <- function(fn){
    dt_temp <- fread(fn, sep=",")
    keycols <- c("ID", "date")
    setkeyv(dt_temp,keycols)  # Notice there's a "v" after setkey with multiple keys
    return(dt_temp)

}
# then using 
mylist <- lapply(all.files, readdata)
mydata <- do.call('rbind',mylist)

代码运行良好,但速度并不理想.每个 txt 文件有 1M 的观测值和 12 个字段.

The code works fine, but the speed is not satisfactory. Each txt file has 1M observations and 12 fields.

如果我使用 fread 读取单个文件,它会很快.但是使用apply的话,速度极慢,而且明显比一个个地读取文件要花很多时间.我想知道这里哪里出错了,速度增益有什么改进吗?

If I use the fread to read a single file, it's fast. But using apply, then speed is extremely slow, and obviously take much time than reading files one by one. I wonder where went wrong here, is there any improvements for the speed gain?

我试过plyr包中的llply,速度提升不大.

I tried the llply in plyr package, there're not much speed gains.

另外,data.table中是否有任何语法来实现垂直连接,如sqlrbindunion>?

Also, is there any syntax in data.table to achieve vertical join like rbind and union in sql?

谢谢.

推荐答案

使用 rbindlist() 旨在 rbind 一个 listdata.table 一起...

Use rbindlist() which is designed to rbind a list of data.table's together...

mylist <- lapply(all.files, readdata)
mydata <- rbindlist( mylist )

正如 @Roland 所说,不要在函数的每次迭代中设置键!

And as @Roland says, do not set the key in each iteration of your function!

总而言之,这是最好的:

So in summary, this is best :

l <- lapply(all.files, fread, sep=",")
dt <- rbindlist( l )
setkey( dt , ID, date )

这篇关于使用 data.table (with fread) 快速读取和组合多个文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆