scikit-learn可以处理多少个功能? [英] How many features can scikit-learn handle?

查看:99
本文介绍了scikit-learn可以处理多少个功能?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个大小为[66k,56k]的csv文件(行,列).它是一个稀疏矩阵.我知道numpy可以处理该大小的矩阵.我想根据大家的经验,知道scikit-learn算法可以舒适地处理多少个功能?

I have a csv file of [66k, 56k] size (rows, columns). Its a sparse matrix. I know that numpy can handle that size a matrix. I would like to know based on everyone's experience, how many features scikit-learn algorithms can handle comfortably?

推荐答案

取决于估计器.在这样的规模下,线性模型仍然可以很好地发挥作用,而SVM可能需要花费大量时间进行训练(并且由于它们无法处理稀疏矩阵,因此请忽略随机森林).

Depends on the estimator. At that size, linear models still perform well, while SVMs will probably take forever to train (and forget about random forests since they won't handle sparse matrices).

我亲自将LinearSVCLogisticRegressionSGDClassifier用于稀疏矩阵,大小大约为300k×330万,没有任何麻烦.参见@amueller的 scikit-learn速查表为当前工作选择合适的估算器.

I've personally used LinearSVC, LogisticRegression and SGDClassifier with sparse matrices of size roughly 300k × 3.3 million without any trouble. See @amueller's scikit-learn cheat sheet for picking the right estimator for the job at hand.

完全公开:我是一个scikit学习核心开发人员.

Full disclosure: I'm a scikit-learn core developer.

这篇关于scikit-learn可以处理多少个功能?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆