在Matlab和R之间共享大型数据集 [英] Sharing large datasets between Matlab and R

查看:99
本文介绍了在Matlab和R之间共享大型数据集的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要一种相对有效的方法来在Matlab和 R 之间共享数据.

我已经检查了 SaveR MATLAB R链接 ,但 SaveR 格式化 Matlab的二进制数据首先作为文本字符串,然后将其打印到ASCII文件中,而不是 MATLAB R-link 仅适用于Windows(它使用基于COM的界面).

更新:

Dirk发布了一份清单,列出了比 SaveR Matlab R-link 似乎更好的解决方案.我最近还了解了 RAM磁盘 (请参见此处(了解一些实现示例),并认为它们可能有助于进一步在Matlab和R(或类似的计算环境)之间共享大型数据集.这使我想到以下问题:

假定数据适合Matlab或R的本机数据容器中机器的内存:

  1. 是否列出了任何解决方案,所以 更适合RAM磁盘吗?

  2. 是否还有其他内容 要考虑的因素 RAM磁盘时使用的帐户 而不是第二存储 解决方案?

谢谢!

解决方案

想法的结合,并附上我对事物的R面了解更多的告诫:

  • CRAN上的 R.matlab 包可以帮助您:此软件包提供了读取和写入MAT文件的方法.它还使与在本地或远程主机上运行的Matlab v6或更高版本进行通信(评估代码,发送和检索对象等)成为可能. 正如您建议的那样,

  • HDF5 是可能的,但我听说R CRAN软件包 hdf5 中的支持有些基本

  • NetCDF 可以替代; CRAN具有软件包 RNetCDF ncdf4

  • 使用数据库,尤其是轻量和基于文件的数据库,例如SQLite或H4,它们都具有R支持

  • 使用通用的序列化/反序列化格式; R通过 RProtoBuf 支持Google协议缓冲区,Google指向 Rcpp .数据集越大,保存转换时看起来就越有吸引力.

I need a relatively efficient way to share data between Matlab and R.

I have checked SaveR and MATLAB R-link, but SaveR formats Matlab's binary data as text strings first and then prints them to an ASCII file, which is not efficient for large datasets, and MATLAB R-link only works on Windows (it uses a COM-based interface).

Update:

Dirk has posted a list of what seem to be better solutions to this problem than SaveR and Matlab R-link. I also learned recently about RAM disks (see here and here for some implementation examples), and thought that they might facilitate the task of sharing large datasets between Matlab and R (or similar computational environments) further. This leads me to the following questions:

Assumming that the data fits in the machines' memory in Matlab's or R's native data containers:

  1. Are any of the solutions listed so far a better fit for RAM disks?

  2. Are there any additional considerations to be taken into account when dealing with RAM disks instead of with secundary-storage solutions?

Thanks!

解决方案

Couple of ideas, and with the caveat that I know more about the R side of things:

  • Tthe R.matlab package on CRAN can help: This package provides methods to read and write MAT files. It also makes it possible to communicate (evaluate code, send and retrieve objects etc.) with Matlab v6 or higher running locally or on a remote host

  • HDF5, as you suggested, is a possibility but I heard that the R support in CRAN package hdf5 is somewhat basic

  • NetCDF may be an alternative; CRAN has packages RNetCDF, ncdf and ncdf4

  • Use a database, especially a light and file-based one like SQLite or H4 both of which have R support

  • Use a common serialization / de-serialization format; R has support for Google Protocol Buffers via RProtoBuf and Google points to protobuf-matlab for Matlab

  • Write your own! Especially when you only need something basic like large rectangular matrices then nothing will beat a direct binary write; I did this once years ago for Octave (which is close to Matlab). You can extend Matab via mex files; R has its API and helpers like Rcpp. The larger your data sets, the more attractive this may look as you save the conversions.

这篇关于在Matlab和R之间共享大型数据集的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆