使用sklearn用卡方平方预测多标签 [英] Predicting with chi squared kernel for multilabel using sklearn
问题描述
我正在尝试使用预先计算的卡方内核来获得SVM的预测.但是,尝试运行clf.predict()时遇到问题.
I'm trying to get predictions for an SVM using a precomputed chi-squared kernel. However, I am getting issues when trying to run clf.predict().
min_max_scaler = preprocessing.MinMaxScaler()
X_train_scaled = min_max_scaler.fit_transform(features_train)
X_test_scaled = min_max_scaler.transform(features_test)
K = chi2_kernel(X_train_scaled)
svm = SVC(kernel='precomputed', cache_size=1000).fit(K, labels_train)
y_pred_chi2 = svm.predict(X_test_scaled)
我得到的错误如下:
ValueError: bad input shape (4627L, 20L)
我猜这个问题是由于多重标签引起的,所以我通过执行以下操作仅针对1个类别训练了分类器:
I am guessing this issue is because of the multi-label, so I trained the classifier for only 1 category by doing the following:
svm = SVC(kernel='precomputed', cache_size=1000).fit(K, labels_train[:, 0])
但是,当尝试运行clf.predict(X_test_scaled)时,出现错误:
However, when trying to run clf.predict(X_test_scaled), I get the error:
ValueError: X.shape[1] = 44604 should be equal to 4627, the number of samples at training time
为什么测试样本的数量必须与训练样本的数量相同?
Why does the test samples have to be the same number as the training samples?
以下是相关矩阵的形状(要素的尺寸为44604,共有20个类别):
Here is the shape of the relevant matrices (the features have 44604 dimensions and there are 20 categories):
X_train_scaled.shape : (4627L, 44604L)
X_test_scaled.shape : (4637L, 44604L)
K.shape : (4627L, 4627L)
labels_train.shape : (4627L, 20L)
在旁注中,这些矩阵的形状尺寸旁边是否有L是正常的吗?
On a side note, is it normal that there is L next to the shape sizes of these matrices?
推荐答案
您需要为预测函数提供测试数据和训练数据之间的内核.最简单的方法是为内核参数kernel=chi2_kernel
提供一个可调用对象.
使用
You need to give the predict function the kernel between the test data and the training data. The easiest way for that is to give a callable to the kernel parameter kernel=chi2_kernel
.
Using
K_test = chi2_kernel(X_test_scaled)
不起作用.它必须是
K_test = chi2_kernel(X_test_scaled, X_train_scaled)
这篇关于使用sklearn用卡方平方预测多标签的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!