大于RAM的NumPy阵列:写入磁盘还是内核外解决方案? [英] Numpy array larger than RAM: write to disk or out-of-core solution?

查看:39
本文介绍了大于RAM的NumPy阵列:写入磁盘还是内核外解决方案?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有以下工作流程,借此将数据附加到一个空的熊猫系列对象.(此空数组也可以是NumPy数组,甚至是基本列表.)

I have the following workflow, whereby I append data to an empty pandas Series object. (This empty array could also be a NumPy array, or even a basic list.)

in_memory_array = pd.Series([])

for df in list_of_pandas_dataframes:
    new = df.apply(lambda row: compute_something(row), axis=1)  ## new is a pandas.Series
    in_memory_array = in_memory_array.append(new)

我的问题是,生成的数组 in_memory_array 对于RAM来说太大了.我不需要在计算中将所有对象都保留在内存中.

My problem is that the resulting array in_memory_array becomes too large for RAM. I don't need to keep all objects in memory for this computation.

我认为我的选择是以某种方式将对象腌制到磁盘上,一旦阵列对于RAM来说太大了.

I think my options are somehow pickling objects to disk once the array gets too big for RAM, e.g.

# N = some size in bytes too large for RAM
if sys.getsizeof(in_memory_array) > N: 
    with open('mypickle.pickle', 'wb') as f:
        pickle.dump(in_memory_array, f)

否则,有没有核心解决方案?最好的情况是创建一些上限,以使对象在RAM中的增长不能超过X GB.

Otherwise, is there an out-of-core solution? The best case scenario would be to create some cap such that the object cannot grow larger than X GB in RAM.

推荐答案

查看此python库:https://pypi.org/project/wendelin.core/它允许您处理大于RAM和本地磁盘的阵列.

Check out this python library : https://pypi.org/project/wendelin.core/ It allows you to work with arrays bigger than RAM and local disk.

这篇关于大于RAM的NumPy阵列:写入磁盘还是内核外解决方案?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆