PCA 对于分类特征? [英] PCA For categorical features?

查看:34
本文介绍了PCA 对于分类特征?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

据我所知,我认为 PCA 只能对连续特征执行.但是在尝试了解 onehot 编码和标签编码之间的区别时,通过以下链接中的帖子:

In my understanding, I thought PCA can be performed only for continuous features. But while trying to understand the difference between onehot encoding and label encoding came through a post in the following link:

何时使用 One Hot Encoding vsLabelEncoder 与 DictVectorizo​​r?

它指出在 PCA 之后进行一次热编码是一种非常好的方法,这基本上意味着 PCA 应用于分类特征.因此感到困惑,请建议我.

It states that one hot encoding followed by PCA is a very good method, which basically means PCA is applied for categorical features. Hence confused, please suggest me on the same.

推荐答案

我不同意其他人的观点.

I disagree with the others.

虽然您可以对二进制数据使用 PCA(例如单热编码数据),但这并不意味着它是一件好事,或者它会工作得很好.

While you can use PCA on binary data (e.g. one-hot encoded data) that does not mean it is a good thing, or it will work very well.

PCA 是为连续变量设计的.它试图最小化方差(=平方偏差).当您有二元变量时,平方偏差的概念就会失效.

PCA is designed for continuous variables. It tries to minimize variance (=squared deviations). The concept of squared deviations breaks down when you have binary variables.

所以是的,您可以使用 PCA.是的,你会得到一个输出.它甚至是最小二乘输出:PCA 不会在此类数据上出现段错误.它有效,但它的意义比您希望的要少得多;并且据说没有例如有意义频繁模式挖掘.

So yes, you can use PCA. And yes, you get an output. It even is a least-squared output: it's not as if PCA would segfault on such data. It works, but it is just much less meaningful than you'd want it to be; and supposedly less meaningful than e.g. frequent pattern mining.

这篇关于PCA 对于分类特征?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆