多处理中的共享内存 [英] Shared memory in multiprocessing

查看：89 发布时间：2020/4/29 3:22:32 python multiprocessing shared-memory large-data

本文介绍了多处理中的共享内存的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有三个大名单.第一个包含位数组(模块位数组0.8.0)，另外两个包含整数数组.

I have three large lists. First contains bitarrays (module bitarray 0.8.0) and the other two contain arrays of integers.

l1=[bitarray 1, bitarray 2, ... ,bitarray n]
l2=[array 1, array 2, ... , array n]
l3=[array 1, array 2, ... , array n]

这些数据结构占用大量RAM(总计约16GB).

These data structures take quite a bit of RAM (~16GB total).

如果我使用以下方法启动12个子流程:

If i start 12 sub-processes using:

multiprocessing.Process(target=someFunction, args=(l1,l2,l3))

这是否意味着将为每个子流程复制l1，l2和l3，或者子流程将共享这些列表?或者更直接地说，我将使用16GB还是192GB的RAM?

Does this mean that l1, l2 and l3 will be copied for each sub-process or will the sub-processes share these lists? Or to be more direct, will I use 16GB or 192GB of RAM?

someFunction将从这些列表中读取一些值，然后根据读取的值执行一些计算.结果将返回到父进程.列表l1，l2和l3不会被someFunction修改.

someFunction will read some values from these lists and then performs some calculations based on the values read. The results will be returned to the parent-process. The lists l1, l2 and l3 will not be modified by someFunction.

因此，我将假定子流程不需要并且不会复制这些庞大的列表，而只会与父级共享它们.意味着由于linux下的写时复制"方法，该程序将占用16GB的RAM(无论我启动了多少个子进程)? 我是对还是遗漏了一些会导致列表被复制的东西?

Therefore i would assume that the sub-processes do not need and would not copy these huge lists but would instead just share them with the parent. Meaning that the program would take 16GB of RAM (regardless of how many sub-processes i start) due to the copy-on-write approach under linux? Am i correct or am i missing something that would cause the lists to be copied?

编辑: 在阅读了有关该主题的更多内容后，我仍然感到困惑.一方面，Linux使用写时复制，这意味着没有数据被复制.另一方面，访问对象将更改其引用计数(我仍然不确定为什么以及这意味着什么).即使这样，整个对象会被复制吗?

EDIT: I am still confused, after reading a bit more on the subject. On the one hand Linux uses copy-on-write, which should mean that no data is copied. On the other hand, accessing the object will change its ref-count (i am still unsure why and what does that mean). Even so, will the entire object be copied?

例如，如果我按如下方式定义someFunction:

For example if i define someFunction as follows:

def someFunction(list1, list2, list3):
    i=random.randint(0,99999)
    print list1[i], list2[i], list3[i]

使用此功能是否意味着将为每个子流程完全复制l1，l2和l3?

Would using this function mean that l1, l2 and l3 will be copied entirely for each sub-process?

有没有办法检查这个?

EDIT2 在阅读了更多内容并监视了子进程运行时系统的总内存使用情况之后，似乎确实为每个子进程都复制了整个对象.似乎是因为引用计数.

EDIT2 After reading a bit more and monitoring total memory usage of the system while sub-processes are running, it seems that entire objects are indeed copied for each sub-process. And it seems to be because reference counting.

在我的程序中实际上不需要l1，l2和l3的引用计数.这是因为l1，l2和l3将保留在内存中(不变)，直到父进程退出.在此之前，无需释放这些列表使用的内存.实际上，我可以肯定的是，在程序退出之前，引用计数将保持大于0(对于这些列表以及这些列表中的每个对象).

The reference counting for l1, l2 and l3 is actually unneeded in my program. This is because l1, l2 and l3 will be kept in memory (unchanged) until the parent-process exits. There is no need to free the memory used by these lists until then. In fact i know for sure that the reference count will remain above 0 (for these lists and every object in these lists) until the program exits.

所以现在问题来了，我如何确保对象不会复制到每个子流程中?我可以禁用这些列表以及这些列表中的每个对象的引用计数吗?

So now the question becomes, how can i make sure that the objects will not be copied to each sub-process? Can i perhaps disable reference counting for these lists and each object in these lists?

EDIT3 .子流程不需要修改l1，l2和l3或这些列表中的任何对象.子流程只需要能够引用其中一些对象，而不会导致为每个子流程复制内存.

EDIT3 Just an additional note. Sub-processes do not need to modify l1, l2 and l3 or any objects in these lists. The sub-processes only need to be able to reference some of these objects without causing the memory to be copied for each sub-process.

多处理中的共享内存 [英] Shared memory in multiprocessing

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

多处理中的共享内存 [英] Shared memory in multiprocessing

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭