如何在numpy和R之间传递大型数组? [英] How can I pass large arrays between numpy and R?

查看:107
本文介绍了如何在numpy和R之间传递大型数组?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用python和numpy/scipy进行正则表达式和文本处理应用程序的词干分析.但是我也想使用一些R的统计数据包.

I'm using python and numpy/scipy to do regex and stemming for a text processing application. But I want to use some of R's statistical packages as well.

将数据从python传递到R的最佳方法是什么? (然后回来?)

What's the best way to pass the data from python to R? (And back?)

此外,我需要在某个时候将阵列备份到磁盘,因此,如果这是最好的解决方案,那么我愿意从python保存并加载R.矩阵很大(例如100,000 x 10,000),因此使用稀疏矩阵也可能很好.

Also, I need to backup the array to disk at some point, so I'm open to saving from python and loading th R if that's the best solution. The matrices are pretty big (e.g. 100,000 x 10,000), so using sparse matrices might also be nice.

很抱歉,如果这是转贴.我还找不到任何可以将所有这些组合在一起的东西.

Apologies if this is a repost. I haven't been able to find anything that puts all these pieces together.

推荐答案

  • 您是否已经研究过 RPy ?这是R的python接口.我想这将为您节省数据处理的时间.

    • Have you already looked into RPy? It's a python interface to R. I guess that would spare you the data handling.

      要备份NumPy数组,可以使用 pickle .由于保存大量数据时似乎会产生大量开销,因此最好使用HDF标准保存NumPy阵列.这是一篇涵盖以下内容的文章: http ://www.shocksolution.com/2010/01/10/storing-large-numpy-arrays-on-disk-python-pickle-vs-hdf5adsf/

      To backup your NumPy arrays you can use pickle. As it seems to create a lot of overhead when saving huge data, NumPy arrays are best saved using the HDF standard. Here's a article covering that: http://www.shocksolution.com/2010/01/10/storing-large-numpy-arrays-on-disk-python-pickle-vs-hdf5adsf/

      这篇关于如何在numpy和R之间传递大型数组?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆