使用numpy创建大型随机布尔矩阵 [英] Create large random boolean matrix with numpy

查看:320
本文介绍了使用numpy创建大型随机布尔矩阵的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试创建一个巨大的boolean矩阵,该矩阵以给定的概率p随机填充TrueFalse.最初,我使用以下代码:

I am trying to create a huge boolean matrix which is randomly filled with True and False with a given probability p. At first I used this code:

N = 30000
p = 0.1
np.random.choice(a=[False, True], size=(N, N), p=[p, 1-p])  

但是可悲的是,对于这么大的N来说,它似乎并没有终止.因此,我尝试通过执行以下操作将其拆分为单行的生成:

But sadly it does not seem to terminate for this big N. So I tried to split it up into the generation of the single rows by doing this:

N = 30000
p = 0.1
mask = np.empty((N, N))
for i in range (N):
     mask[i] = np.random.choice(a=[False, True], size=N, p=[p, 1-p])            
     if (i % 100 == 0):
          print(i)

现在,发生了一件奇怪的事情(至少在我的设备上如此):前1100行非常快速地生成-但是在此之后,代码变得非常慢.为什么会这样呢?我在这里想念什么?是否有更好的方法来创建一个大矩阵,该矩阵具有概率为pTrue项和概率为1-pFalse项?

Now, there happens something strange (at least on my device): The first ~1100 rows are very fastly generated - but after it, the code becomes horribly slow. Why is this happening? What do I miss here? Are there better ways to create a big matrix which has True entries with probability p and False entries with probability 1-p?

编辑:许多人都认为RAM将是一个问题:因为运行代码的设备将近500GB RAM,所以这不会成为问题.

Edit: As many of you assumed that the RAM will be a problem: As the device which will run the code has almost 500GB RAM, this won't be a problem.

推荐答案

问题是您的RAM,值在创建时就存储在内存中.我刚刚使用以下命令创建了此矩阵:

The problem is your RAM, the values are being stored in memory as it's being created. I just created this matrix using this command:

np.random.choice(a=[False, True], size=(N, N), p=[p, 1-p])

我使用了具有64GB RAM和8个内核的AWS i3实例.要创建此矩阵,htop显示它占用约20GB的RAM.这是一个基准,以防万一:

I used an AWS i3 instance with 64GB of RAM and 8 cores. To create this matrix, htop shows that it takes up ~20GB of RAM. Here is a benchmark in case you care:

time np.random.choice(a=[False, True], size=(N, N), p=[p, 1-p])

CPU times: user 18.3 s, sys: 3.4 s, total: 21.7 s
Wall time: 21.7 s


 def mask_method(N, p):
    for i in range(N):
        mask[i] = np.random.choice(a=[False, True], size=N, p=[p, 1-p])
        if (i % 100 == 0):
            print(i)

time mask_method(N,p)

CPU times: user 20.9 s, sys: 1.55 s, total: 22.5 s
Wall time: 22.5 s

请注意,mask方法在峰值时仅占用约9GB的RAM.

Note that the mask method only takes up ~9GB of RAM at it's peak.

第一个方法在处理完成后会刷新RAM,而函数方法会保留所有RAM.

The first method flushes the RAM after the process is done where as the function method retains all of it.

这篇关于使用numpy创建大型随机布尔矩阵的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆