R:如何快速读取没有RAM限制的大型.dta文件 [英] R: How to quickly read large .dta files without RAM Limitations

查看:527
本文介绍了R:如何快速读取没有RAM限制的大型.dta文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个10 GB的.dta Stata文件,我正在尝试将其读入64位R 3.3.1.我正在使用具有约130 GB RAM(4 TB HD)的虚拟机,.dta文件大约有300万行,介于400到800个变量之间.

I have a 10 GB .dta Stata file and I am trying to read it into 64-bit R 3.3.1. I am working on a virtual machine with about 130 GB of RAM (4 TB HD) and the .dta file is about 3 million rows and somewhere between 400 and 800 variables.

我知道data.table()是读取.txt和.csv文件的最快方法,但是是否有人建议将较大的.dta文件读取到R中?将文件作为.dta文件读入Stata大约需要20-30秒,尽管我需要在打开文件之前设置最大工作内存(我将最大内存设置为100 GB).

I know data.table() is the fastest way to read in .txt and .csv files, but does anyone have a recommendation for reading largeish .dta files into R? Reading the file into Stata as a .dta file requires about 20-30 seconds, although I need to set my working memory max prior to opening the file (I set the max at 100 GB).

我还没有尝试在Stata中导入.csv,但是我希望避免与Stata接触文件.通过找到了解决方案,使用memisc将stata .dta文件导入到其中R ,但这是假设RAM不足.就我而言,我应该有足够的RAM来处理文件.

I have not tried importing to .csv in Stata, but I hope to avoid touching the file with Stata. A solution is found via Using memisc to import stata .dta file into R but this assumes RAM is scarce. In my case, I should have sufficient RAM to work with the file.

推荐答案

在R中加载大型Stata数据集的最快方法是使用readstata13包.我已经比较了大型数据集 在本文中 ,结果反复表明,readstata13是读取R中Stata数据集最快的可用软件包.

The fastest way to load a large Stata dataset in R is using the readstata13 package. I have compared the performance of foreign, readstata13, and haven packages on a large dataset in this post and the results repeatedly showed that readstata13 is the fastest available package for reading Stata dataset in R.

这篇关于R:如何快速读取没有RAM限制的大型.dta文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆