使用scikit时scipy.sparse矩阵的缩放问题 [英] Scaling issues with scipy.sparse matrix while using scikit

查看:148
本文介绍了使用scikit时scipy.sparse矩阵的缩放问题的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在使用scikit(python)解决机器学习问题时,我需要在使用SVM进行训练之前对scipy.sparse矩阵进行缩放,以实现更高的准确性.但是它在此处中明确提到:

While solving a machine learning problem using scikit (python) I need to do scaling of scipy.sparse matrix before doing the training using SVM in order to achieve higher accuracy. But its clearly mentioned here, that:

scale和StandardScaler才接受scipy.sparse矩阵作为输入.否则将引发ValueError,因为静默居中会破坏稀疏性,并经常由于无意中分配过多的内存而使执行崩溃.

这意味着我对此不能有零均值.因此,如何缩放这个稀疏矩阵,使其与单位方差也具有零均值.我还需要存储此缩放",以便可以在测试矩阵上使用相同的转换来缩放它.

This means that I cannot have zero mean with this. So how do I scale this sparse matrix so that it has zero mean too along with unit variance. I also need to store this 'scaling' so that I can use the same transformation on the test matrix to scale it as well.

推荐答案

如果矩阵很小,则可以使用X.toarray()对其进行致密化.如果矩阵很大,那么可能会消耗您的RAM.

If the matrix is small, you can densify it with X.toarray(). If the matrix is large, then this will probably blow your RAM.

作为均值居中和缩放的替代方法,您可以尝试使用sklearn.preprocessing.Normalizer进行每样本归一化;这适用于频率功能(例如,在文本分类中).

As an alternative to mean-centering and scaling, you can try per-sample normalization with sklearn.preprocessing.Normalizer; this is appropriate for frequency features (e.g. in text classification).

这篇关于使用scikit时scipy.sparse矩阵的缩放问题的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆