Python“数组索引过多" [英] Python "Too many indices for array"
问题描述
我正在使用pandas在python中读取文件,然后将其保存在numpy数组中. 该文件的大小为11303402行x 10列. 我需要拆分数据以进行交叉验证,为此,我将数据切成11303402行x 9列示例和1个数组11303402行x 1 col标签. 以下是代码:
I am reading a file in python using pandas and then saving it in a numpy array. The file has the dimension of 11303402 rows x 10 columns. I need to split the data for cross validation and for that I sliced the data into 11303402 rows x 9 columns of examples and 1 array of 11303402 rows x 1 col of labels. The following is the code:
tdata=pd.read_csv('train.csv')
tdata.columns='Arrival_Time','Creation_Time','x','y','z','User','Model','Device','sensor','gt']
User_Data = np.array(tdata)
features = User_Data[:,0:9]
labels = User_Data[:,9:10]
该错误来自以下代码:
classes=np.unique(labels)
idx=labels==classes[0]
Yt=labels[idx]
Xt=features[idx,:]
在线:
Xt=features[idx,:]
它说数组索引太多"
所有3个数据集的形状为:
The shapes of all 3 data sets are:
print np.shape(tdata) = (11303402, 10)
print np.shape(features) = (11303402, 9)
print np.shape(labels) = (11303402, 1)
如果有人知道问题所在,请提供帮助.
If anyone knows the problem, please help.
推荐答案
问题在于idx
具有形状(11303402,1)
,因为逻辑比较返回的数组形状与labels
相同.这两个维度使用features
中的所有索引.快速解决方法是
The problem is idx
has shape (11303402,1)
because the logical comparison returns an array of the same shape as labels
. These two dimensions use all of the indexes in features
. The quick work around is
Xt=features[idx[:,0],:]
这篇关于Python“数组索引过多"的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!