读入并将许多CSV文件合并到data.table中 [英] Read in and merge many CSV files into data.table
问题描述
我有很多 .csv
文件,包含相同population的变量,由 surname
code> first.name 。
所以每个 csv
有三列:名字,姓氏和感兴趣的变量。
我将它们中的每一个加载到单独的数据表中,然后我要将它们合并。
表格)
姓氏< - c('A','B')
first.names< - c('C','D')
weights< - c 80,90)
heights< - c(180,190)
write.csv(data.frame(surname = surname,first.name = first.names,
file ='variable-height.csv')
write.csv(data.frame(surname = surname,first.name = first.names,
weight = weights),file = 'variable-weight.csv')
variables.to.load< - c('height','weight')
for(i in variables.to.load){
assign(paste('DT。',i),fread(paste0('variable-',i,'.csv')) DT。',i)))))
setkey(eval(parse(text = paste0('DT。',i))),surname,first.name)
} / code>
加载它们并正确设置键。
我缺少的是自动合并。
DT.merged&列表(DT.height,DT.weight))
可以工作,但我想在自动方式,因为真正的变量是更多。也就是说,我想写 list()
: DT.height
,
我尝试过:
library('stringr')
DT.merged< - Reduce(merge,list(eval(parse(text = str_c(paste0('DT。',variables.to .load),collapse =',')))))
>
我做整个过程,因为我想为我的总体选择不同的变量(总数超过30GB和30个变量的csv)。因此,使用 fread
对整个 csv
选择性读取列似乎比较慢。
DTlist < - > lapply(paste0('variable-',variables.to.load,'.csv'),
function(x){
d< - fread(x)
setkey ,first.name)
d
}
)
DT.merged < - Reduce(merge,DT)
话虽如此,正如Roland和我在评论中提到的,如果你访问一个CSV文件包含所有你想要的数据,这不太可能是最好的方法。
如果您确实可以访问这样的文件,最好使用 select
code> fread
DT < - fread('master.csv',select = c(variables.to.load))
I have many .csv
files, containing variables for the same "population", keyed by surname
and first.name
.
So every csv
has three columns: first name, surname and the variable of interest.
I load each one of them in separate data tables which then I want to merge them.
library(data.table)
surnames <- c('A', 'B')
first.names <- c('C', 'D')
weights <- c(80, 90)
heights <- c(180, 190)
write.csv(data.frame(surname = surnames, first.name = first.names,
height = heights), file = 'variable-height.csv')
write.csv(data.frame(surname = surnames, first.name = first.names,
weight = weights), file = 'variable-weight.csv')
variables.to.load <- c('height', 'weight')
for (i in variables.to.load) {
assign(paste0('DT.', i), fread(paste0('variable-', i, '.csv')))
print(dim(eval(parse(text = paste0('DT.', i)))))
setkey(eval(parse(text = paste0('DT.', i))), surname, first.name)
}
loads them and sets the keys correctly. What I am missing, though, is the automatic merging.
DT.merged <- Reduce(merge, list(DT.height, DT.weight))
works, but I want to do it in an automatic way, since the real variables are many more. That is, I want to write the contents of list()
: DT.height
, DT.weight
, etc in an automatic way.
I have tried:
library('stringr')
DT.merged <- Reduce(merge, list(eval(parse(text = str_c(paste0('DT.', variables.to.load), collapse = ', ')))))
with no results.
I do the whole process, because I want to selectively have different variables for my population (which totals to a csv with more than 30GB and around 30 variables). So using fread
on the full csv
to selectively read columns seems rather slow.
This should work for your question
DTlist <- lapply(paste0('variable-', variables.to.load, '.csv'),
function(x) {
d <- fread(x)
setkey(d, surname, first.name)
d
}
)
DT.merged <- Reduce(merge, DT)
That being said, as Roland and I allude to in comments, this is unlikely to be the best approach if you have access to a single CSV file with all your desired data.
If you do have access to such a file you'd be better served to use the select
parameter of fread
DT <- fread('master.csv', select=c(variables.to.load))
这篇关于读入并将许多CSV文件合并到data.table中的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!