PCA或SVD在机器学习中的重要性 [英] importance of PCA or SVD in machine learning

查看:149
本文介绍了PCA或SVD在机器学习中的重要性的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

一直以来(特别是在Netflix竞赛中),我总是遇到这个博客(或排行榜论坛),他们提到在数据上应用简单的SVD步骤如何帮助他们减少数据稀疏性或总体上改善了广告的效果.他们的算法在手. 我一直在想(很久以来),但我无法猜测为什么会这样. 总的来说,我手头的数据非常嘈杂(这也是bigdata的有趣部分),然后我知道一些基本的功能扩展内容,例如对数转换内容,均值归一化. 但是,像SVD这样的东西有什么帮助. 因此,假设我有一个庞大的用户评分电影矩阵.然后在该矩阵中,我实现了某些版本的推荐系统(例如协作过滤):

All this time (specially in Netflix contest), I always come across this blog (or leaderboard forum) where they mention how by applying a simple SVD step on data helped them in reducing sparsity in data or in general improved the performance of their algorithm in hand. I am trying to think (since long time) but I am not able to guess why is it so. In general, the data in hand I get is very noisy (which is also the fun part of bigdata) and then I do know some basic feature scaling stuff like log-transformation stuff , mean normalization. But how does something like SVD helps. So lets say i have a huge matrix of user rating movies..and then in this matrix, I implement some version of recommendation system (say collaborative filtering):

1) Without SVD
2) With SVD

它有什么帮助

推荐答案

SVD不用于规范化数据,而是用于消除冗余数据,即用于降维.例如,如果您有两个变量,一个是湿度指数,另一个是下雨的概率,那么它们的相关性是如此之高,以至于第二个变量对分类或回归任务没有任何帮助. SVD中的特征值可帮助您确定哪些变量最能提供信息,以及哪些变量可以不提供.

SVD is not used to normalize the data, but to get rid of redundant data, that is, for dimensionality reduction. For example, if you have two variables, one is humidity index and another one is probability of rain, then their correlation is so high, that the second one does not contribute with any additional information useful for a classification or regression task. The eigenvalues in SVD help you determine what variables are most informative, and which ones you can do without.

它的工作方式很简单.您对训练数据执行SVD(称为矩阵A),以获得U,S和V *.然后将所有小于某个特定阈值(例如0.1)的S值都设置为零,称为新矩阵S'.然后获得A'= US'V *并将A'用作新的训练数据.现在,您的某些功能已设置为零,并且可以删除,有时不会造成任何性能损失(取决于您的数据和所选的阈值).这称为 k 截短的SVD.

The way it works is simple. You perform SVD over your training data (call it matrix A), to obtain U, S and V*. Then set to zero all values of S less than a certain arbitrary threshold (e.g. 0.1), call this new matrix S'. Then obtain A' = US'V* and use A' as your new training data. Some of your features are now set to zero and can be removed, sometimes without any performance penalty (depending on your data and the threshold chosen). This is called k-truncated SVD.

SVD并不能帮助您实现稀疏性,而只能在功能冗余时帮助您.对于预测任务,这两个功能可能既稀疏又具有信息性(相关性),因此您不能删除其中任何一个.

SVD doesn't help you with sparsity though, only helps you when features are redundant. Two features can be both sparse and informative (relevant) for a prediction task, so you can't remove either one.

使用SVD,您可以从 n 功能转到 k 功能,其中每个功能都是线性的原始n的组合.就像特征选择一样,这是降维步骤.但是,如果存在冗余特征,则取决于您的数据集(例如,最大熵特征选择),与SVD相比,特征选择算法可能会导致更好的分类性能. Weka 附带了一堆.

Using SVD, you go from n features to k features, where each one will be a linear combination of the original n. It's a dimensionality reduction step, just like feature selection is. When redundant features are present, though, a feature selection algorithm may lead to better classification performance than SVD depending on your data set (for example, maximum entropy feature selection). Weka comes with a bunch of them.

请参阅: http://en.wikibooks.org/wiki/Data_Mining_Algorithms_In_R/Dimensionality_Reduction/Singular_Value_Decomposition

https://stats.stackexchange.com/questions/33142/what-happens-when-you-apply-svd-to-a-collaborative-filtering-problem-what-is-th

这篇关于PCA或SVD在机器学习中的重要性的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆