大于RAM的NumPy阵列:写入磁盘还是内核外解决方案? [英] Numpy array larger than RAM: write to disk or out-of-core solution?

查看：39 发布时间：2021/4/9 20:26:35 python arrays numpy memory bigdata

本文介绍了大于RAM的NumPy阵列:写入磁盘还是内核外解决方案?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有以下工作流程，借此将数据附加到一个空的熊猫系列对象.(此空数组也可以是NumPy数组，甚至是基本列表.)

I have the following workflow, whereby I append data to an empty pandas Series object. (This empty array could also be a NumPy array, or even a basic list.)

in_memory_array = pd.Series([])

for df in list_of_pandas_dataframes:
    new = df.apply(lambda row: compute_something(row), axis=1)  ## new is a pandas.Series
    in_memory_array = in_memory_array.append(new)

我的问题是，生成的数组 in_memory_array 对于RAM来说太大了.我不需要在计算中将所有对象都保留在内存中.

My problem is that the resulting array in_memory_array becomes too large for RAM. I don't need to keep all objects in memory for this computation.

我认为我的选择是以某种方式将对象腌制到磁盘上，一旦阵列对于RAM来说太大了.

I think my options are somehow pickling objects to disk once the array gets too big for RAM, e.g.

# N = some size in bytes too large for RAM
if sys.getsizeof(in_memory_array) > N: 
    with open('mypickle.pickle', 'wb') as f:
        pickle.dump(in_memory_array, f)

否则，有没有核心解决方案?最好的情况是创建一些上限，以使对象在RAM中的增长不能超过X GB.

Otherwise, is there an out-of-core solution? The best case scenario would be to create some cap such that the object cannot grow larger than X GB in RAM.

大于RAM的NumPy阵列:写入磁盘还是内核外解决方案? [英] Numpy array larger than RAM: write to disk or out-of-core solution?

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

大于RAM的NumPy阵列:写入磁盘还是内核外解决方案? [英] Numpy array larger than RAM: write to disk or out-of-core solution?

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭