LDA 忽略 n_components? [英] LDA ignoring n_components?

查看:61
本文介绍了LDA 忽略 n_components?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

当我尝试使用 Scikit-Learn 的 LDA 时,它一直只给我一个组件,即使我要求更多:

<预><代码>>>>从 sklearn.lda 导入 LDA>>>x = np.random.randn(5,5)>>>y = [真、假、真、假、真]>>>对于范围内的 i (1,6):... lda = LDA(n_components=i)... 模型 = lda.fit(x,y)... 模型.transform(x)

给予

/Users/orthogonal/virtualenvs/osxml/lib/python2.7/site-packages/sklearn/lda.py:161: UserWarning: 变量共线warnings.warn("变量共线")数组([[-0.12635305],[-1.09293574],[1.83978459],[-0.37521856],[-0.24527725]])数组([[-0.12635305],[-1.09293574],[1.83978459],[-0.37521856],[-0.24527725]])数组([[-0.12635305],[-1.09293574],[1.83978459],[-0.37521856],[-0.24527725]])数组([[-0.12635305],[-1.09293574],[1.83978459],[-0.37521856],[-0.24527725]])数组([[-0.12635305],[-1.09293574],[1.83978459],[-0.37521856],[-0.24527725]])

如您所见,它每次只打印出一个维度.为什么是这样?它与变量共线有什么关系吗?

此外,当我使用 Scikit-Learn 的 PCA 执行此操作时,它可以满足我的需求.

<预><代码>>>>从 sklearn.decomposition 导入 PCA>>>对于范围内的 i (1,6):... pca = PCA(n_components=i)... 模型 = pca.fit(x)... 模型.transform(x)...数组([[ 0.83688322],[0.79565477],[-2.4373344],[0.72500848],[0.07978792]])数组([[ 0.83688322, -1.56459039],[ 0.79565477, 0.84710518],[-2.4373344,-0.35548589],[0.72500848,-0.49079647],[ 0.07978792, 1.56376757]])数组([[ 0.83688322, -1.56459039, -0.3353066 ],[ 0.79565477, 0.84710518, -1.21454498],[-2.4373344, -0.35548589, -0.16684946],[ 0.72500848, -0.49079647, 1.09006296],[ 0.07978792, 1.56376757, 0.62663807]])数组([[ 0.83688322, -1.56459039, -0.3353066, 0.22196922],[ 0.79565477, 0.84710518, -1.21454498, -0.15961993],[-2.4373344, -0.35548589, -0.16684946, -0.04114339],[ 0.72500848, -0.49079647, 1.09006296, -0.2438673 ],[ 0.07978792, 1.56376757, 0.62663807, 0.2226614 ]])数组([[[8.36883220e-01,-1.56459039e+00,-3.35306597e-01,2.21969223e-01,-1.66533454e-16],[ 7.95654771e-01, 8.47105182e-01, -1.21454498e+00,-1.59619933e-01, 3.33066907e-16],[ -2.43733440e+00, -3.55485895e-01, -1.66849458e-01,-4.11433949e-02, 0.00000000e+00],[ 7.25008484e-01, -4.90796471e-01, 1.09006296e+00,-2.43867297e-01,-1.38777878e-16],[ 7.97879229e-02, 1.56376757e+00, 6.26638070e-01,2.22661402e-01, 2.22044605e-16]])

解决方案

这是LDA.transform 的相关降维行,它使用scalings_.如文档字符串中所述,scalings_ 最多有 n_classes - 1 列.这就是您希望使用 transform 获得的最大列数.在您的情况下,2 个类 (True, False),最多产生 1 列.

When I am trying to work with LDA from Scikit-Learn, it keeps only giving me one component, even though I am asking for more:

>>> from sklearn.lda import LDA
>>> x = np.random.randn(5,5)
>>> y = [True, False, True, False, True]
>>> for i in range(1,6):
...     lda = LDA(n_components=i)
...     model = lda.fit(x,y)
...     model.transform(x)

Gives

/Users/orthogonal/virtualenvs/osxml/lib/python2.7/site-packages/sklearn/lda.py:161: UserWarning: Variables are collinear
  warnings.warn("Variables are collinear")
array([[-0.12635305],
       [-1.09293574],
       [ 1.83978459],
       [-0.37521856],
       [-0.24527725]])
array([[-0.12635305],
       [-1.09293574],
       [ 1.83978459],
       [-0.37521856],
       [-0.24527725]])
array([[-0.12635305],
       [-1.09293574],
       [ 1.83978459],
       [-0.37521856],
       [-0.24527725]])
array([[-0.12635305],
       [-1.09293574],
       [ 1.83978459],
       [-0.37521856],
       [-0.24527725]])
array([[-0.12635305],
       [-1.09293574],
       [ 1.83978459],
       [-0.37521856],
       [-0.24527725]])

As you can see, it's only printing out one dimension each time. Why is this? Does it have anything to do with the variables being collinear?

Additionally, when I do this with Scikit-Learn's PCA, it gives me what I want.

>>> from sklearn.decomposition import PCA
>>> for i in range(1,6):
...     pca = PCA(n_components=i)
...     model = pca.fit(x)
...     model.transform(x)
... 
array([[ 0.83688322],
       [ 0.79565477],
       [-2.4373344 ],
       [ 0.72500848],
       [ 0.07978792]])
array([[ 0.83688322, -1.56459039],
       [ 0.79565477,  0.84710518],
       [-2.4373344 , -0.35548589],
       [ 0.72500848, -0.49079647],
       [ 0.07978792,  1.56376757]])
array([[ 0.83688322, -1.56459039, -0.3353066 ],
       [ 0.79565477,  0.84710518, -1.21454498],
       [-2.4373344 , -0.35548589, -0.16684946],
       [ 0.72500848, -0.49079647,  1.09006296],
       [ 0.07978792,  1.56376757,  0.62663807]])
array([[ 0.83688322, -1.56459039, -0.3353066 ,  0.22196922],
       [ 0.79565477,  0.84710518, -1.21454498, -0.15961993],
       [-2.4373344 , -0.35548589, -0.16684946, -0.04114339],
       [ 0.72500848, -0.49079647,  1.09006296, -0.2438673 ],
       [ 0.07978792,  1.56376757,  0.62663807,  0.2226614 ]])
array([[  8.36883220e-01,  -1.56459039e+00,  -3.35306597e-01,
          2.21969223e-01,  -1.66533454e-16],
       [  7.95654771e-01,   8.47105182e-01,  -1.21454498e+00,
         -1.59619933e-01,   3.33066907e-16],
       [ -2.43733440e+00,  -3.55485895e-01,  -1.66849458e-01,
         -4.11433949e-02,   0.00000000e+00],
       [  7.25008484e-01,  -4.90796471e-01,   1.09006296e+00,
         -2.43867297e-01,  -1.38777878e-16],
       [  7.97879229e-02,   1.56376757e+00,   6.26638070e-01,
          2.22661402e-01,   2.22044605e-16]])

解决方案

This is the relevant, dimension-reducing line of LDA.transform, it uses scalings_. As described in the docstring, scalings_ has maximally n_classes - 1 columns. This is then the maximal number of columns you can hope to obtain using transform. In your case, 2 classes (True, False), yields maximally 1 column.

这篇关于LDA 忽略 n_components?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆