非连续数组的块状视图连续部分为较大大小的dtype [英] Numpy view contiguous part of non-contiguous array as dtype of bigger size

查看:104
本文介绍了非连续数组的块状视图连续部分为较大大小的dtype的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图从超长字符数组生成三元组数组(即连续三个字母的组合):

I was trying to generate an array of trigrams (i.e. continuous-three-letter combinations) from a super long char array:

# data is actually load from a source file
a = np.random.randint(0, 256, 2**28, 'B').view('c')

由于复制效率不高(并且会产生诸如高速缓存未命中之类的问题),因此我使用跨步技巧直接生成了Trigram:

Since making copy is not efficient (and it creates problems like cache miss), I directly generated the trigram using stride tricks:

tri = np.lib.stride_tricks.as_strided(a, (len(a)-2,3), a.strides*2)

这会生成形状为(2 ** 28-2,3)的三字母组合列表,其中每一行都是一个三字母组合.现在,我想将Trigram转换为字符串列表(即 S3 ),以便numpy更加合理地"显示它(而不是单个字符).

This generates a trigram list with shape (2**28-2, 3) where each row is a trigram. Now I want to convert the trigram to a list of string (i.e. S3) so that numpy displays it more "reasonably" (instead of individual chars).

tri = tri.view('S3')

它给出了例外:

ValueError: To change to a dtype of a different size, the array must be C-contiguous

我理解通常为了创建有意义的视图,数据应该是连续的,但是此数据在应该存在的位置"处是连续的:每个三个元素都是连续的.

I understand generally data should be contiguous in order to create a meaningful view, but this data is contiguous at "where it should be": each three elements are contiguous.

所以我想知道如何以更大尺寸的dtype view 非连续的 np.ndarray 中的连续部分?更加标准"的方式会更好,同时也欢迎使用骇客的方式.看来我可以使用 np.lib.stride_tricks.as_strided 自由设置 shape stride ,但是我不能强迫dtype 之类的东西,这就是问题所在.

So I'm wondering how to view contiguous part in non-contiguous np.ndarray as dtype of bigger size? A more "standard" way would be better, while hackish ways are also welcome. It seems that I can set shape and stride freely with np.lib.stride_tricks.as_strided, but I can't force the dtype to be something, which is the problem here.

编辑

非连续数组可以通过简单切片来制作.例如:

Non-contiguous array can be made by simple slicing. For example:

np.empty((8, 4), 'uint32')[:, :2].view('uint64')

将在上面抛出相同的异常(从内存的角度来看,我应该能够做到这一点).这种情况比上面的示例更为常见.

will throw the same exception above (while from a memory point of view I should be able to do this). This case is much more common than my example above.

推荐答案

如果您有权访问派生非连续数组的连续数组,通常应该可以解决此限制.

If you have access to a contiguous array from which your non-contiguous one is derived, it should typically be possible to work around this limitation.

例如,您的三字母组可以这样获得:

For example your trigrams can be obtained like so:

>>> a = np.random.randint(0, 256, 2**28, 'B').view('c')
>>> a
array([b')', b'\xf2', b'\xf7', ..., b'\xf4', b'\xf1', b'z'], dtype='|S1')
>>> np.lib.stride_tricks.as_strided(a[:0].view('S3'), ((2**28)-2,), (1,))
array([b')\xf2\xf7', b'\xf2\xf7\x14', b'\xf7\x14\x1b', ...,
       b'\xc9\x14\xf4', b'\x14\xf4\xf1', b'\xf4\xf1z'], dtype='|S3')

实际上,此示例表明,我们所需的只是在内存缓冲区底部的一个连续存根"以进行视图转换,因为此后,因为 as_strided 不会进行很多检查,因此我们基本上可以自由进行做我们喜欢的事.

In fact, this example demonstrates that all we need is a contiguous "stub" at the memory buffer's base for view casting, since afterwards, because as_strided does not do many checks we are essentially free to do whatever we like.

似乎我们总是可以通过切片大小为0的数组来获得这样的存根.对于第二个示例:

It seems we can always get such a stub by slicing to a size 0 array. For your second example:

>>> X = np.empty((8, 4), 'uint32')[:, :2]
>>> np.lib.stride_tricks.as_strided(X[:0].view(np.uint64), (8, 1), X.strides)
array([[140133325248280],
       [             32],
       [       32083728],
       [       31978800],
       [              0],
       [       29686448],
       [             32],
       [       32362720]], dtype=uint64)

这篇关于非连续数组的块状视图连续部分为较大大小的dtype的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆