包装布尔数组需要经过int(numpy 1.8.2) [英] packing boolean array needs go throught int (numpy 1.8.2)

查看:105
本文介绍了包装布尔数组需要经过int(numpy 1.8.2)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在寻找一种更紧凑的方式来存储布尔值.numpy内部需要8位来存储一个布尔值,但是 np.packbits 允许打包他们,这很酷.

I'm looking for the more compact way to store boolean. numpy internally need 8bits to store one boolean, but np.packbits allow to pack them, that's pretty cool.

问题是要在 4e6字节数组中包装一个 32e6字节布尔数组,我们需要先花费 256e6字节进行转换int数组中的布尔数组!

The problem is that to pack in a 4e6 bytes array a 32e6 bytes array of boolean we need to first spend 256e6 bytes to convert the boolean array in int array !

In [1]: db_bool = np.array(np.random.randint(2, size=(int(2e6), 16)), dtype=bool)
In [2]: db_int = np.asarray(db_bool, dtype=int)
In [3]: db_packed = np.packbits(db_int, axis=0)
In [4]: db.nbytes, db_int.nbytes, db_packed.nbytes
Out[5]: (32000000, 256000000, 4000000)

在numpy追踪器中有一个关于此问题的旧问题(参见 https://github.com/numpy/numpy/issues/5377 )

There is a one year old issue opened in the numpy tracker about that (Cf. https://github.com/numpy/numpy/issues/5377 )

有人解决方案/更好的解决方法吗?

Has someone a solution/better workaround ?

当我们尝试以正确的方式进行追溯时:

The traceback when we try to do it the right way:

In [28]: db_pb = np.packbits(db_bool)
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-28-3715e167166b> in <module>()
----> 1 db_pb = np.packbits(db_bool)
TypeError: Expected an input array of integer data type
In [29]:

PS:我会尝试一下bitarray,但可以用纯粹的numpy来获取它.

PS: I will give bitarray a try but would have get it in pure numpy.

推荐答案

无需将布尔数组转换为本地 int dtype(在x86_64上为64位).您可以通过将其视为 np.uint8 来避免复制布尔数组,该数组每个元素还使用一个字节:

There's no need to convert your boolean array to the native int dtype (which will be 64 bit on x86_64). You can avoid copying your boolean array by viewing it as np.uint8, which also uses a single byte per element:

packed = np.packbits(db_bool.view(np.uint8))

unpacked = np.unpackbits(packed)[:db_bool.size].reshape(db_bool.shape).view(np.bool)

print(np.all(db_bool == unpacked))
# True

此外,从 np.packbits 现在应该可以直接在布尔数组上工作>一年多以前的提交(numpy v1.10.0及更高版本).

Also, np.packbits should now work directly on boolean arrays as of this commit from over a year ago (numpy v1.10.0 and newer).

这篇关于包装布尔数组需要经过int(numpy 1.8.2)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆