当数组具有不同长度的字符串时,记录数组上的numpy.concatenate失败 [英] numpy.concatenate on record arrays fails when array has different length strings

查看:167
本文介绍了当数组具有不同长度的字符串时,记录数组上的numpy.concatenate失败的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

尝试连接具有dtype字符串字段但长度不同的记录数组时,连接失败.

When trying to concatenate record arrays which has a field of dtype string but has different length, concatenation fails.

如下面的示例所示,如果'f1'的长度相同,则串联可以工作,但如果不是,则可以失败.

As you can see in the following example, concatenate works if 'f1' is of same length but fails, if not.

In [1]: import numpy as np

In [2]: a = np.core.records.fromarrays( ([1,2], ["one","two"]) )

In [3]: b = np.core.records.fromarrays( ([3,4,5], ["three","four","three"]) )

In [4]: c = np.core.records.fromarrays( ([6], ["six"]) )

In [5]: np.concatenate( (a,c) )
Out[5]: 
array([(1, 'one'), (2, 'two'), (6, 'six')], 
      dtype=[('f0', '<i8'), ('f1', '|S3')])

In [6]: np.concatenate( (a,b) )
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)

/u/jegannas/<ipython console> in <module>()

TypeError: expected a readable buffer object

但是,如果我们只是串联数组(而不是记录),即使字符串大小不同,它也会成功.

But, again if we just concatenate the arrays (not the records), it succeeds, though strings are of different size.

In [8]: np.concatenate( (a['f1'], b['f1']) )
Out[8]: 
array(['one', 'two', 'three', 'four', 'three'], 
      dtype='|S5')

这是连接记录时连接中的错误吗?还是这是预期的行为.我只想出了以下方法来克服这个问题.

Is this a bug in concatenate when concatenating records or is this the expected behavior. I have figured only the following way to overcome this.

In [10]: np.concatenate( (a.astype(b.dtype), b) )
Out[10]: 
array([(1, 'one'), (2, 'two'), (3, 'three'), (4, 'four'), (5, 'three')], 
      dtype=[('f0', '<i8'), ('f1', '|S5')]

但是这里的麻烦是我必须遍历所有的rearray,我要串联并找到最大的字符串长度,所以我必须使用它.如果记录数组中有多个字符串列,那么我也需要跟踪其他一些事情.

But the trouble here is that I have to go through all the recarrays, I am concatenating and find the largest string length and I have to use that. If I have more than one string columns in the record array, I need to keep track of a few other things too.

至少在目前,您认为克服此问题的最佳方法是什么?

What do you think is the best way to overcome this, at least for now?

推荐答案

发布完整答案.正如Pierre GM建议的模块一样:

To post a complete answer. As Pierre GM suggested the module:

import numpy.lib.recfunctions

提供解决方案.但是,您需要执行的功能是:

gives a solution. The function that does what you want however is:

numpy.lib.recfunctions.stack_arrays((a,b), autoconvert=True, usemask=False)

(usemask=False只是为了避免创建您可能没有使用的掩码数组.重要的是autoconvert=True强制从adtype "|S3"转换为).

(usemask=False is just to avoid creation of a masked array, which you are probably not using. The important thing is autoconvert=True to force the conversion from a's dtype "|S3" to "|S5").

这篇关于当数组具有不同长度的字符串时,记录数组上的numpy.concatenate失败的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆