SKLearn 内核 PCA“预计算"争论 [英] SKLearn Kernel PCA "Precomputed" argument
问题描述
我正在尝试使用 scikit-learn 执行内核 PCA,使用不在其实现中的内核(以及该内核识别的自定义输入格式).如果我能提前计算内核,保存它,然后在内核 PCA 中使用它,那可能是最简单的.
KernelPCA 的 precomputed
参数意味着我可以做我想做的事;然而,它没有在文档中解释,我找不到任何使用它的例子.即使在 sklearn 中 KernelPCA 的单元测试源代码,代码似乎从未真正说明预计算的内核是什么是.
有谁知道我将如何使用我自己的预计算内核?
在拟合时需要使用的预计算内核是样本之间的克矩阵.IE.如果您有 x_i
表示的 n_samples
个样本,那么您需要将矩阵 G
作为第一个参数提供给 fit
由 G_ij = K(x_i, x_j)
定义,用于 i, j
在 0
和 n_samples - 1
之间.
例如对于线性内核,这是
def linear_kernel(X, Y):返回 X.dot(Y.T)X = np.random.randn(10, 20)克 = 线性内核(X,X)
对于X_test
的预测,你需要通过
X_test = np.random.randn(5, 20)gram_test = linear_kernel(X_test, X)
这将在单元测试中看到,例如这里>
I am trying to perform Kernel PCA using scikit-learn, using a kernel that is not in their implementation (and a custom input format that is recognized by this kernel). It would probably be easiest if I could just compute the kernel ahead of time, save it, and then use it in Kernel PCA.
The precomputed
argument to KernelPCA would imply that I am able to do what I want; however, it's not explained in the documentation, and I can't find any examples of it being used. Even in the unit test source code for KernelPCA in sklearn, the code doesn't ever seem to actually say what the precomputed kernel is.
Does anyone know how I would use my own precomputed kernel?
The precomputed kernel that you need to use at fit time is the gram matrix between the samples. I.e. if you have n_samples
samples denoted by x_i
, then you need to give to fit
as first parameter the matrix G
defined by G_ij = K(x_i, x_j)
for i, j
between 0
and n_samples - 1
.
E.g. for the linear kernel this is
def linear_kernel(X, Y):
return X.dot(Y.T)
X = np.random.randn(10, 20)
gram = linear_kernel(X, X)
For prediction on X_test
you need to pass
X_test = np.random.randn(5, 20)
gram_test = linear_kernel(X_test, X)
This is to be seen in the unit tests, e.g. here
这篇关于SKLearn 内核 PCA“预计算"争论的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!