write.csv 用于大型 data.table [英] write.csv for large data.table

查看:13
本文介绍了write.csv 用于大型 data.table的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个不是很大 (2 GB) 的 data.table 但由于某种原因 write.csv 需要很长时间才能写出来(我' 从来没有真正完成等待)并且似乎使用了大量的 RAM 来完成它.

I have a data.table that is not very big (2 GB) but for some reason write.csv takes an extremely long time to write it out (I've never actually finished waiting) and seems to use a ton of RAM to do it.

我尝试将 data.table 转换为 data.frame 虽然这实际上不应该做任何事情,因为 data.table 扩展 <代码>data.frame.有人遇到过这个吗?

I tried converting the data.table to a data.frame although this shouldn't really do anything since data.table extends data.frame. has anyone run into this?

更重要的是,如果你用 Ctrl-C 停止它,R 似乎不会回馈内存.

More importantly, if you stop it with Ctrl-C, R does not seem to give memory back.

推荐答案

UPDATE 2019.01.07:

fwrite 自 2016 年 11 月 25 日起加入 CRAN.

fwrite has been on CRAN since 2016-11-25.

install.packages("data.table")

更新 08.04.2016:

fwrite 最近已添加到 data.table 包的开发版本中.它也并行运行(隐式).

fwrite has been recently added to the data.table package's development version. It also runs in parallel (implicitly).

# Install development version of data.table
install.packages("data.table", 
                  repos = "https://Rdatatable.github.io/data.table", type = "source")

# Load package
library(data.table)

# Load data        
data(USArrests)

# Write CSV
fwrite(USArrests, "USArrests_fwrite.csv")

根据加速write.table的性能中显示的详细基准测试,fwrite 比那里的 write.csv (YMMV) 快约 17 倍.

According to the detailed benchmark tests shown under speeding up the performance of write.table, fwrite is ~17x faster than write.csv there (YMMV).

2015 年 12 月 15 日更新:

以后可能会在 data.table 包中加入 fwrite 函数,参见:https://github.com/Rdatatable/data.table/issues/580.在这个线程中,一个 GIST 被链接,它为这样一个函数提供了一个原型,将过程加速了 2 倍(根据作者的说法,https://gist.github.com/oseiskar/15c4a3fd9b6ec5856c89).

In the future there might be a fwrite function in the data.table package, see: https://github.com/Rdatatable/data.table/issues/580. In this thread a GIST is linked, which provides a prototype for such a function speeding up the process by a factor of 2 (according to the author, https://gist.github.com/oseiskar/15c4a3fd9b6ec5856c89).

原始答案:

我遇到了同样的问题(尝试编写更大的 CSV 文件)并最终决定不使用 CSV 文件.

I had the same problems (trying to write even larger CSV files) and decided finally against using CSV files.

我建议您使用 SQLite,因为它比处理 CSV 文件要快得多:

I would recommend you to use SQLite as it is much faster than dealing with CSV files:

require("RSQLite")
# Set up database    
drv <- dbDriver("SQLite")
con <- dbConnect(drv, dbname = "test.db")
# Load example data
data(USArrests)
# Write data "USArrests" in table "USArrests" in database "test.db"    
dbWriteTable(con, "arrests", USArrests)

# Test if the data was correctly stored in the database, i.e. 
# run an exemplary query on the newly created database 
dbGetQuery(con, "SELECT * FROM arrests WHERE Murder > 10")       
# row_names Murder Assault UrbanPop Rape
# 1         Alabama   13.2     236       58 21.2
# 2         Florida   15.4     335       80 31.9
# 3         Georgia   17.4     211       60 25.8
# 4        Illinois   10.4     249       83 24.0
# 5       Louisiana   15.4     249       66 22.2
# 6        Maryland   11.3     300       67 27.8
# 7        Michigan   12.1     255       74 35.1
# 8     Mississippi   16.1     259       44 17.1
# 9          Nevada   12.2     252       81 46.0
# 10     New Mexico   11.4     285       70 32.1
# 11       New York   11.1     254       86 26.1
# 12 North Carolina   13.0     337       45 16.1
# 13 South Carolina   14.4     279       48 22.5
# 14      Tennessee   13.2     188       59 26.9
# 15          Texas   12.7     201       80 25.5

# Close the connection to the database
dbDisconnect(con)

有关详细信息,请参阅 http://cran.r-project.org/web/packages/RSQLite/RSQLite.pdf

For further information, see http://cran.r-project.org/web/packages/RSQLite/RSQLite.pdf

您也可以使用 http://sqliteadmin.orbmu2k.de/ 之类的软件来访问数据库并将数据库导出为 CSV 等.

You can also use a software like http://sqliteadmin.orbmu2k.de/ to access the database and export the database to CSV etc.

--

这篇关于write.csv 用于大型 data.table的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆