Python(Scikit Learn)LDA折叠为一维 [英] Python (scikit learn) lda collapsing to single dimension

查看:89
本文介绍了Python(Scikit Learn)LDA折叠为一维的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我对scikit学习和机器学习总体来说是新手.

我目前正在设计一种SVM,以预测特定的氨基酸序列是否会被蛋白酶切割.到目前为止,SVM方法似乎运行良好:

我想可视化两个类别(切割和未切割)之间的距离,因此我尝试使用线性判别分析,该分析与主成分分析类似,并使用以下代码:

from sklearn.discriminant_analysis import LinearDiscriminantAnalysis
lda = LinearDiscriminantAnalysis(n_components=2)
targs = np.array([1 if _ else 0 for _ in XOR_list])
DATA = np.array(data_list)
X_r2 = lda.fit(DATA, targs).transform(DATA)
plt.figure()
for c, i, target_name in zip("rg", [1, 0],["Cleaved","Not Cleaved"]):
    plt.scatter(X_r2[targs == i], X_r2[targs == i], c=c, label=target_name)
plt.legend()
plt.title('LDA of cleavage_site dataset')

但是,LDA仅给出一维结果

In: print X_r2[:5]
Out: [[ 6.74369996]
 [ 4.14254941]
 [ 5.19537896]
 [ 7.00884032]
 [ 3.54707676]]

但是,pca分析将根据我输入的数据给出2个维度:

pca = PCA(n_components=2)
X_r = pca.fit(DATA).transform(DATA)
print X_r[:5]
Out: [[ 0.05474151  0.38401203]
 [ 0.39244191  0.74113729]
 [-0.56785236 -0.30109694]
 [-0.55633116 -0.30267444]
 [ 0.41311866 -0.25501662]]

这是带有输入数据的两个google-docs的链接.我不是在使用序列信息,而只是在后面的数字信息.这些文件分为正控制数据和负控制数据. 输入数据: file1 file2

解决方案

LDA不是降维技术. LDA是一个分类器,人们可视化决策函数只是一个副作用,而且-不幸的是,对于您的用例-二元问题(2类)的决策函数是一维.您的代码没有错,这就是线性二进制分类器的每个决策函数的样子.

通常,对于2个类,您最多可以得到1维投影;对于K> 2个类,您最多可以得到K维投影.使用其他分解技术(例如1对1),您可以提高到K(K-1)/2,但又只能提高2个以上的类.

I'm very new to scikit learn and machine learning in general.

I am currently designing a SVM to predict if a specific amino acid sequence will be cut by a protease. So far the the SVM method seems to be working quite well:

I'd like to visualize the distance between the two categories (cut and uncut), so I'm trying to use the linear discrimination analysis, which is similar to the principal component analysis, using the following code:

from sklearn.discriminant_analysis import LinearDiscriminantAnalysis
lda = LinearDiscriminantAnalysis(n_components=2)
targs = np.array([1 if _ else 0 for _ in XOR_list])
DATA = np.array(data_list)
X_r2 = lda.fit(DATA, targs).transform(DATA)
plt.figure()
for c, i, target_name in zip("rg", [1, 0],["Cleaved","Not Cleaved"]):
    plt.scatter(X_r2[targs == i], X_r2[targs == i], c=c, label=target_name)
plt.legend()
plt.title('LDA of cleavage_site dataset')

However, the LDA is only giving a 1D result

In: print X_r2[:5]
Out: [[ 6.74369996]
 [ 4.14254941]
 [ 5.19537896]
 [ 7.00884032]
 [ 3.54707676]]

However, the pca analysis will give 2 dimensions with the data I am inputting:

pca = PCA(n_components=2)
X_r = pca.fit(DATA).transform(DATA)
print X_r[:5]
Out: [[ 0.05474151  0.38401203]
 [ 0.39244191  0.74113729]
 [-0.56785236 -0.30109694]
 [-0.55633116 -0.30267444]
 [ 0.41311866 -0.25501662]]

edit: here is a link to two google-docs with the input data. I am not using the sequence information, just the numerical information that follows. The files are split up between positive and negative control data. Input data: file1 file2

解决方案

LDA is not a dimensionality reduction technique. LDA is a classifier, the fact that people visualize decision function is just a side effect, and - unfortunately for your use case - decision function for binary problem (2 classes) is 1 dimensional. There is nothing wrong with your code, this is how every single decision function of a linear binary classifier looks like.

In general for 2 classes you get at most 1-dim projection and for K>2 classes you can get up to K-dim projection. With other decomposition techniques (like 1 vs 1) you can go up to K(K-1)/2 but again, only for more than 2 classes.

这篇关于Python(Scikit Learn)LDA折叠为一维的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆