使用sklearn用卡方平方预测多标签 [英] Predicting with chi squared kernel for multilabel using sklearn

查看:242
本文介绍了使用sklearn用卡方平方预测多标签的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用预先计算的卡方内核来获得SVM的预测.但是,尝试运行clf.predict()时遇到问题.

I'm trying to get predictions for an SVM using a precomputed chi-squared kernel. However, I am getting issues when trying to run clf.predict().

min_max_scaler = preprocessing.MinMaxScaler()
X_train_scaled = min_max_scaler.fit_transform(features_train)
X_test_scaled = min_max_scaler.transform(features_test)

K = chi2_kernel(X_train_scaled)
svm = SVC(kernel='precomputed', cache_size=1000).fit(K, labels_train)
y_pred_chi2 = svm.predict(X_test_scaled)

我得到的错误如下:

ValueError: bad input shape (4627L, 20L)

我猜这个问题是由于多重标签引起的,所以我通过执行以下操作仅针对1个类别训练了分类器:

I am guessing this issue is because of the multi-label, so I trained the classifier for only 1 category by doing the following:

svm = SVC(kernel='precomputed', cache_size=1000).fit(K, labels_train[:, 0])

但是,当尝试运行clf.predict(X_test_scaled)时,出现错误:

However, when trying to run clf.predict(X_test_scaled), I get the error:

ValueError: X.shape[1] = 44604 should be equal to 4627, the number of samples at training time

为什么测试样本的数量必须与训练样本的数量相同?

Why does the test samples have to be the same number as the training samples?

以下是相关矩阵的形状(要素的尺寸为44604,共有20个类别):

Here is the shape of the relevant matrices (the features have 44604 dimensions and there are 20 categories):

X_train_scaled.shape    : (4627L, 44604L)
X_test_scaled.shape     : (4637L, 44604L)
K.shape                 : (4627L, 4627L)
labels_train.shape      : (4627L, 20L)

在旁注中,这些矩阵的形状尺寸旁边是否有L是正常的吗?

On a side note, is it normal that there is L next to the shape sizes of these matrices?

推荐答案

您需要为预测函数提供测试数据和训练数据之间的内核.最简单的方法是为内核参数kernel=chi2_kernel提供一个可调用对象. 使用

You need to give the predict function the kernel between the test data and the training data. The easiest way for that is to give a callable to the kernel parameter kernel=chi2_kernel. Using

K_test = chi2_kernel(X_test_scaled)

不起作用.它必须是

K_test = chi2_kernel(X_test_scaled, X_train_scaled)

这篇关于使用sklearn用卡方平方预测多标签的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆