如何在numpy和R之间传递大型数组? [英] How can I pass large arrays between numpy and R?
问题描述
我正在使用python和numpy/scipy进行正则表达式和文本处理应用程序的词干分析.但是我也想使用一些R的统计数据包.
I'm using python and numpy/scipy to do regex and stemming for a text processing application. But I want to use some of R's statistical packages as well.
将数据从python传递到R的最佳方法是什么? (然后回来?)
What's the best way to pass the data from python to R? (And back?)
此外,我需要在某个时候将阵列备份到磁盘,因此,如果这是最好的解决方案,那么我愿意从python保存并加载R.矩阵很大(例如100,000 x 10,000),因此使用稀疏矩阵也可能很好.
Also, I need to backup the array to disk at some point, so I'm open to saving from python and loading th R if that's the best solution. The matrices are pretty big (e.g. 100,000 x 10,000), so using sparse matrices might also be nice.
很抱歉,如果这是转贴.我还找不到任何可以将所有这些组合在一起的东西.
Apologies if this is a repost. I haven't been able to find anything that puts all these pieces together.
推荐答案
-
您是否已经研究过 RPy ?这是R的python接口.我想这将为您节省数据处理的时间.
Have you already looked into RPy? It's a python interface to R. I guess that would spare you the data handling.
要备份NumPy数组,可以使用 pickle .由于保存大量数据时似乎会产生大量开销,因此最好使用HDF标准保存NumPy阵列.这是一篇涵盖以下内容的文章: http ://www.shocksolution.com/2010/01/10/storing-large-numpy-arrays-on-disk-python-pickle-vs-hdf5adsf/
To backup your NumPy arrays you can use pickle. As it seems to create a lot of overhead when saving huge data, NumPy arrays are best saved using the HDF standard. Here's a article covering that: http://www.shocksolution.com/2010/01/10/storing-large-numpy-arrays-on-disk-python-pickle-vs-hdf5adsf/
这篇关于如何在numpy和R之间传递大型数组?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!