读入并将许多CSV文件合并到data.table中 [英] Read in and merge many CSV files into data.table

查看:259
本文介绍了读入并将许多CSV文件合并到data.table中的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有很多 .csv 文件,包含相同population的变量,由 surname code> first.name 。
所以每个 csv 有三列:名字,姓氏和感兴趣的变量。
我将它们中的每一个加载到单独的数据表中,然后我要将它们合并。

 表格)
姓氏< - c('A','B')
first.names< - c('C','D')
weights< - c 80,90)
heights< - c(180,190)

write.csv(data.frame(surname = surname,first.name = first.names,
file ='variable-height.csv')
write.csv(data.frame(surname = surname,first.name = first.names,
weight = weights),file = 'variable-weight.csv')

variables.to.load< - c('height','weight')
for(i in variables.to.load){
assign(paste('DT。',i),fread(paste0('variable-',i,'.csv')) DT。',i)))))
setkey(eval(parse(text = paste0('DT。',i))),surname,first.name)
} / code>

加载它们并正确设置键。
我缺少的是自动合并。

  DT.merged&列表(DT.height,DT.weight))

可以工作,但我想在自动方式,因为真正的变量是更多。也就是说,我想写 list() DT.height



我尝试过:

  library('stringr')
DT.merged< - Reduce(merge,list(eval(parse(text = str_c(paste0('DT。',variables.to .load),collapse =',')))))

>

我做整个过程,因为我想为我的总体选择不同的变量(总数超过30GB和30个变量的csv)。因此,使用 fread 对整个 csv 选择性读取列似乎比较慢。



  DTlist < - > 

lapply(paste0('variable-',variables.to.load,'.csv'),
function(x){
d< - fread(x)
setkey ,first.name)
d
}

DT.merged < - Reduce(merge,DT)

话虽如此,正如Roland和我在评论中提到的,如果你访问一个CSV文件包含所有你想要的数据,这不太可能是最好的方法。



如果您确实可以访问这样的文件,最好使用 select code> fread

  DT < -  fread('master.csv',select = c(variables.to.load))


I have many .csv files, containing variables for the same "population", keyed by surname and first.name. So every csv has three columns: first name, surname and the variable of interest. I load each one of them in separate data tables which then I want to merge them.

library(data.table)
surnames <- c('A', 'B')
first.names <- c('C', 'D')
weights <- c(80, 90)
heights <- c(180, 190)

write.csv(data.frame(surname = surnames, first.name = first.names, 
                     height = heights), file = 'variable-height.csv')
write.csv(data.frame(surname = surnames, first.name = first.names,  
                     weight = weights), file = 'variable-weight.csv')

variables.to.load <- c('height', 'weight')
for (i in variables.to.load) {
assign(paste0('DT.', i), fread(paste0('variable-', i, '.csv')))
print(dim(eval(parse(text = paste0('DT.', i)))))
setkey(eval(parse(text = paste0('DT.', i))), surname, first.name)
}

loads them and sets the keys correctly. What I am missing, though, is the automatic merging.

DT.merged <- Reduce(merge, list(DT.height, DT.weight))

works, but I want to do it in an automatic way, since the real variables are many more. That is, I want to write the contents of list(): DT.height, DT.weight, etc in an automatic way.

I have tried:

library('stringr')
DT.merged <- Reduce(merge, list(eval(parse(text = str_c(paste0('DT.', variables.to.load), collapse = ', ')))))

with no results.

I do the whole process, because I want to selectively have different variables for my population (which totals to a csv with more than 30GB and around 30 variables). So using fread on the full csv to selectively read columns seems rather slow.

解决方案

This should work for your question

DTlist <- lapply(paste0('variable-', variables.to.load, '.csv'), 
    function(x) {
       d <- fread(x) 
       setkey(d, surname, first.name)
       d
     }
   )
DT.merged <- Reduce(merge, DT)

That being said, as Roland and I allude to in comments, this is unlikely to be the best approach if you have access to a single CSV file with all your desired data.

If you do have access to such a file you'd be better served to use the select parameter of fread

DT <- fread('master.csv', select=c(variables.to.load))

这篇关于读入并将许多CSV文件合并到data.table中的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆