快速读取和合并Data.Table的Fread和Rbindlist [英] Quick Read and Merge with Data.Table's Fread and Rbindlist
问题描述
我正在寻找一种使用data.table的fread和rbindlist函数快速读取和合并一堆数据文件的方法.我认为,如果fread可以将文件名的向量作为参数,那么它可能是像
I am looking for a way to quickly read and merge a bunch of data files using data.table's fread and rbindlist functions. I think if fread could take a vector of files names as an argument, it could be one, elegant line like
mergeddata = rbindlist(fread(list.files("my/data/directory/")))
但是由于这似乎不是一个选择,所以我采用了一种比较尴尬的方法:遍历文件以读取它们并将它们分配给临时名称,然后将这些临时数据表名称汇总在一起创建.但是,每当我尝试调用data.table名称列表时,我都会被绊倒.因此,我的问题是(1)在这种情况下如何将数据表名称列表传递给rbindlist,以及(2)更广泛地讲,有没有更好的方法来解决此问题?
but since that doesn't seem to be an option, I've taken the more awkward approach of looping through the files to read them in and assign them to temporary names and then put together a list of the temporary data table names created. However I get tripped up whenever I am trying to call the list of data.table names. So my questions are (1) how can I pass a list of datatable names to rbindlist in this context, and (2) more broadly is there a better approach to this problem?
提前感谢您的时间和帮助!
Thanks in advance for the time and help!
datafiles = list.files()
datatablelist = c()
for(i in 1:length(datafiles)){
assign(paste("dt",i,sep=""),fread(datafiles[1]))
datatablelist = append(datatablelist ,paste("dt",i,sep=""))
}
mergeddata = rbindlist(list(datatablelist))
推荐答案
您可以执行datatablelist = lapply(list.files("my/data/directory/"), fread)
,然后重新查找结果数据帧列表.
You could do datatablelist = lapply(list.files("my/data/directory/"), fread)
and then rbind the resulting list of data frames.
尽管lapply
比显式循环更干净,但是如果直接将文件读入列表,则循环将起作用.
Although lapply
is cleaner than an explicit loop, your loop will work if you read the files directly into a list.
datatablelist = list()
for(i in 1:length(datafiles)){
datatablelist[[datafiles[i]]] = fread(datafiles[i])
}
这篇关于快速读取和合并Data.Table的Fread和Rbindlist的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!