如何在不使R REPL崩溃的情况下高效,快速地将大型(6 Gb).csv文件导入R? [英] How do I import a large (6 Gb) .csv file into R efficiently and quickly, without the R REPL crashing?

查看:84
本文介绍了如何在不使R REPL崩溃的情况下高效,快速地将大型(6 Gb).csv文件导入R?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个很大的 .csv 文件,我需要将其导入R以对其进行一些数据操作.我正在使用read.csv(file.csv)方法,在该方法中,我将该方法的结果分配给某些变量MyData.但是,当我尝试在R REPL中运行它时,程序崩溃.有没有一种方法可以有效且快速地处理/读取R中不会损坏终端的 .csv 文件?如果没有,我不应该只使用Python吗?

I have a large .csv file which I need to import into R in order to do some data manipulation on it. I'm using the read.csv(file.csv) method, where I assign the result of the method to some variable MyData. However, when I attempt to run this in the R REPL, the program crashes. Is there a way to efficiently and quickly process/read a .csv file in R that won't crash the terminal? If there isn't, shouldn't I just be using Python?

推荐答案

如果您尝试加载大于可用内存的文件,R将会崩溃,因此您应该看到至少有6gb的可用内存(6gb) .csv在内存中也大约为6gb). Python会有同样的问题 (显然有人问了对python完全一样的问题几年前)

R will crash if you try to load a file that is larger than your available memory, so you should see that you have at least 6gb ram free (a 6gb .csv is roughly 6gb in memory also). Python will have the same problem (apparently someone asked the exact same question for python a few years ago)

要读取大型csv文件,您应该使用readr::read_csv()data.table::fread(),因为它们都比base::read.table()快得多.

For reading large csv files, you should either use readr::read_csv() or data.table::fread(), as both are much faster than base::read.table().

readr::read_csv_chunked支持分块读取csv文件,因此,如果您一次不需要全部数据,则可能会有所帮助.您也可以尝试只读取感兴趣的列,以保持较小的内存大小.

readr::read_csv_chunked supports reading csv files in chunks, so if you don't need your whole data at once, that might help. You could also try just reading the columns of interest, to keep the memory size smaller.

这篇关于如何在不使R REPL崩溃的情况下高效,快速地将大型(6 Gb).csv文件导入R?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆