使用scikit时scipy.sparse矩阵的缩放问题 [英] Scaling issues with scipy.sparse matrix while using scikit
问题描述
在使用scikit(python)解决机器学习问题时,我需要在使用SVM进行训练之前对scipy.sparse矩阵进行缩放,以实现更高的准确性.但是它在此处中明确提到:
While solving a machine learning problem using scikit (python) I need to do scaling of scipy.sparse matrix before doing the training using SVM in order to achieve higher accuracy. But its clearly mentioned here, that:
scale和StandardScaler才接受scipy.sparse矩阵作为输入.否则将引发ValueError,因为静默居中会破坏稀疏性,并经常由于无意中分配过多的内存而使执行崩溃.
这意味着我对此不能有零均值.因此,如何缩放这个稀疏矩阵,使其与单位方差也具有零均值.我还需要存储此缩放",以便可以在测试矩阵上使用相同的转换来缩放它.
This means that I cannot have zero mean with this. So how do I scale this sparse matrix so that it has zero mean too along with unit variance. I also need to store this 'scaling' so that I can use the same transformation on the test matrix to scale it as well.
推荐答案
如果矩阵很小,则可以使用X.toarray()
对其进行致密化.如果矩阵很大,那么可能会消耗您的RAM.
If the matrix is small, you can densify it with X.toarray()
. If the matrix is large, then this will probably blow your RAM.
作为均值居中和缩放的替代方法,您可以尝试使用sklearn.preprocessing.Normalizer
进行每样本归一化;这适用于频率功能(例如,在文本分类中).
As an alternative to mean-centering and scaling, you can try per-sample normalization with sklearn.preprocessing.Normalizer
; this is appropriate for frequency features (e.g. in text classification).
这篇关于使用scikit时scipy.sparse矩阵的缩放问题的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!