如何在不增加内存消耗的情况下绑定data.table? [英] How to bind data.table without increasing the memory consumption?

查看:10
本文介绍了如何在不增加内存消耗的情况下绑定data.table?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有几个具有相同列的大型数据表 dt_1, dt_2, ..., dt_N.我想将它们绑定到一个单独的 datatable 中.如果我使用

I have few huge datatable dt_1, dt_2, ..., dt_N with same cols. I want to bind them together into a single datatable. If I use

dt <- rbind(dt_1, dt_2, ..., dt_N)

dt <- rbindlist(list(dt_1, dt_2, ..., dt_N))

那么内存使用量大约是 dt_1,dt_2,...,dt_N 所需数量的两倍.有没有办法在不显着增加内存消耗的情况下绑定它们?请注意,一旦将它们组合在一起,我就不需要 dt_1, dt_2, ..., dt_N .

then the memory usage is approximately double the amount needed for dt_1,dt_2,...,dt_N. Is there a way to bind them wihout increasing the memory consumption significantly? Note that I do not need dt_1, dt_2, ..., dt_N once they are combined together.

推荐答案

其他方法,使用临时文件‘绑定’:

Other approach, using a temporary file to 'bind':

nobs=10000
d1 <- d2 <- d3 <-  data.table(a=rnorm(nobs),b=rnorm(nobs))
ll<-c('d1','d2','d3')
tmp<-tempfile()

# Write all, writing header only for the first one
for(i in seq_along(ll)) {
  write.table(get(ll[i]),tmp,append=(i!=1),row.names=FALSE,col.names=(i==1))
}

# 'Cleanup' the original objects from memory (should be done by the gc if needed when loading the file
rm(list=ll)

# Read the file in the new object
dt<-fread(tmp)

# Remove the file
unlink(tmp)

明显比 rbind 方法慢,但如果你有内存争用,这不会比要求系统换出内存页面慢.

Obviously slower than the rbind method, but if you have memory contention, this won't be slower than requiring the system to swap out memory pages.

当然,如果您的原始对象首先是从文件加载的,那么在加载到 R 中之前,最好使用另一个最适合处理文件的工具(cat、awk 等)连接文件

Of course if your orignal objects are loaded from file at first, prefer concatenating the files before loading in R with another tool most aimed at working with files (cat, awk, etc.)

这篇关于如何在不增加内存消耗的情况下绑定data.table?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆