使用numpy创建大型随机布尔矩阵 [英] Create large random boolean matrix with numpy

查看：320 发布时间：2020/5/18 19:53:40 python numpy random

本文介绍了使用numpy创建大型随机布尔矩阵的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在尝试创建一个巨大的boolean矩阵，该矩阵以给定的概率p随机填充True和False.最初，我使用以下代码:

I am trying to create a huge boolean matrix which is randomly filled with True and False with a given probability p. At first I used this code:

N = 30000
p = 0.1
np.random.choice(a=[False, True], size=(N, N), p=[p, 1-p])

但是可悲的是，对于这么大的N来说，它似乎并没有终止.因此，我尝试通过执行以下操作将其拆分为单行的生成:

But sadly it does not seem to terminate for this big N. So I tried to split it up into the generation of the single rows by doing this:

N = 30000
p = 0.1
mask = np.empty((N, N))
for i in range (N):
     mask[i] = np.random.choice(a=[False, True], size=N, p=[p, 1-p])            
     if (i % 100 == 0):
          print(i)

现在，发生了一件奇怪的事情(至少在我的设备上如此):前1100行非常快速地生成-但是在此之后，代码变得非常慢.为什么会这样呢?我在这里想念什么?是否有更好的方法来创建一个大矩阵，该矩阵具有概率为p的True项和概率为1-p的False项?

Now, there happens something strange (at least on my device): The first ~1100 rows are very fastly generated - but after it, the code becomes horribly slow. Why is this happening? What do I miss here? Are there better ways to create a big matrix which has True entries with probability p and False entries with probability 1-p?

编辑:许多人都认为RAM将是一个问题:因为运行代码的设备将近500GB RAM，所以这不会成为问题.

Edit: As many of you assumed that the RAM will be a problem: As the device which will run the code has almost 500GB RAM, this won't be a problem.

推荐答案

问题是您的RAM，值在创建时就存储在内存中.我刚刚使用以下命令创建了此矩阵:

The problem is your RAM, the values are being stored in memory as it's being created. I just created this matrix using this command:

np.random.choice(a=[False, True], size=(N, N), p=[p, 1-p])

我使用了具有64GB RAM和8个内核的AWS i3实例.要创建此矩阵，htop显示它占用约20GB的RAM.这是一个基准，以防万一:

I used an AWS i3 instance with 64GB of RAM and 8 cores. To create this matrix, htop shows that it takes up ~20GB of RAM. Here is a benchmark in case you care:

time np.random.choice(a=[False, True], size=(N, N), p=[p, 1-p])

CPU times: user 18.3 s, sys: 3.4 s, total: 21.7 s
Wall time: 21.7 s


 def mask_method(N, p):
    for i in range(N):
        mask[i] = np.random.choice(a=[False, True], size=N, p=[p, 1-p])
        if (i % 100 == 0):
            print(i)

time mask_method(N,p)

CPU times: user 20.9 s, sys: 1.55 s, total: 22.5 s
Wall time: 22.5 s

请注意，mask方法在峰值时仅占用约9GB的RAM.

Note that the mask method only takes up ~9GB of RAM at it's peak.

第一个方法在处理完成后会刷新RAM，而函数方法会保留所有RAM.

The first method flushes the RAM after the process is done where as the function method retains all of it.

这篇关于使用numpy创建大型随机布尔矩阵的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

使用numpy创建大型随机布尔矩阵 [英] Create large random boolean matrix with numpy

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

使用numpy创建大型随机布尔矩阵 [英] Create large random boolean matrix with numpy

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭