尝试创建带标签的numpy数组 [英] Trying to create a labeled numpy array

查看:417
本文介绍了尝试创建带标签的numpy数组的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想要一个带有值的numpy数组和每个值的对应标签.我正在使用此数组进行线性回归,它将成为方程式y = Xb + error中的我的X数据向量.

I want to have a numpy array with values and corresponding labels for each value. I am using this array for linear regression and it will be my X data vector in the equation y = Xb + error.

我的X向量包含大约20个变量,我希望每个变量都可以像X['variable1']这样通过名称进行引用.最初,我使用字典来执行此操作,但意识到用于线性回归的scikit库需要一个numpy矩阵,因此我正在尝试构建一个带有标签的numpy数组.

My X vector consists of about 20 variables, each of which I would like to be able to reference by name like so X['variable1']. I was initially using a dictionary to do this but realized that the scikit library for linear regression requires a numpy matrix, so I am trying to build a numpy array that is labeled.

我不断收到错误消息:

TypeError: a bytes-like object is required, not 'int'.

这就是我在做什么:

X = np.array([3],dtype=[('label1','int')])

我最终希望有20个标记值,如下所示:

I eventually want to have 20 labeled values, something like this:

X = np.array([3,40,7,2,...],
             dtype=[('label1',int'),('label2','int'),('label3','int')...])

非常感谢您对此处语法的任何帮助.谢谢!

Would really appreciate any help on the syntax here. Thanks!

推荐答案

创建具有值的结构化数组的正确方法是使用元组列表:

The correct way to create a structured array, with values, is with a list of tuples:

In [55]: X
Out[55]: 
array([(3,)], 
      dtype=[('label1', '<i4')])

In [56]: X=np.array([(3,4)],dtype=[('label1',int),('label2',int)])

In [57]: X
Out[57]: 
array([(3, 4)], 
      dtype=[('label1', '<i4'), ('label2', '<i4')])

但是我要提醒您,这样的数组不是2d(或矩阵),而是1d,其中包含字段:

But I should caution you that such array is not 2d (or matrix), it is 1d with fields:

In [58]: X.shape
Out[58]: (1,)

In [59]: X.dtype
Out[59]: dtype([('label1', '<i4'), ('label2', '<i4')])

而且您不能跨领域进行数学运算; X*2X.sum()将产生错误.在像y = X*b + error这样的方程式中使用X是没有希望的.

And you can't do math across fields; X*2 and X.sum() will produce errors. Using X in an equation like y = X*b + error will be hopeless.

您可能最好使用实数二维数组,并在您的头部或字典中进行标签和列号之间的映射.

You are probably better off working with real 2d numeric arrays, and do the mapping between labels and column numbers in your head, or with a dictionary.

或使用熊猫.

这篇关于尝试创建带标签的numpy数组的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆