SVC 分类器花费太多时间进行训练 [英] SVC classifier taking too much time for training

查看:78
本文介绍了SVC 分类器花费太多时间进行训练的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我使用带有线性核的 SVC 分类器来训练我的模型.训练数据:42000条记录

I am using SVC classifier with Linear kernel to train my model. Train data: 42000 records

    model = SVC(probability=True)
    model.fit(self.features_train, self.labels_train)
    y_pred = model.predict(self.features_test)
    train_accuracy = model.score(self.features_train,self.labels_train)
    test_accuracy = model.score(self.features_test, self.labels_test)

训练我的模型需要 2 个多小时.难道我做错了什么?另外,可以做些什么来改善时间

It takes more than 2 hours to train my model. Am I doing something wrong? Also, what can be done to improve the time

提前致谢

推荐答案

有多种方法可以加速您的 SVM 训练.设 n 为记录数,d 为嵌入维数.我假设你使用 scikit-learn.

There are several possibilities to speed up your SVM training. Let n be the number of records, and d the embedding dimensionality. I assume you use scikit-learn.

  • 减少训练集的大小.引用文档:

拟合时间复杂度超过样本数量的二次方,这使得很难扩展到具有超过 10000 个样本的数据集.

The fit time complexity is more than quadratic with the number of samples which makes it hard to scale to dataset with more than a couple of 10000 samples.

O(n^2) 复杂性很可能会支配其他因素.因此,为训练采样较少的记录将对时间产生最大的影响.除了随机抽样,您还可以尝试实例选择方法.例如,提出了主要样本分析最近.

O(n^2) complexity will most likely dominate other factors. Sampling fewer records for training will thus have the largest impact on time. Besides random sampling, you could also try instance selection methods. For example, principal sample analysis has been proposed recently.

降低维度.正如其他人在评论中暗示的那样,嵌入维度也会影响运行时.计算线性内核的内积在 O(d) 中.降维因此也可以减少运行时间.在另一个问题中,潜在语义索引被专门建议用于 TF-IDF 表示.

Reducing dimensionality. As others have hinted at in their comments, embedding dimension also impacts runtime. Computing inner products for the linear kernel is in O(d). Dimensionality reduction can, therefore, also reduce runtime. In another question, latent semantic indexing was suggested specifically for TF-IDF representations.

不同的分类器.你可以试试sklearn.svm.LinearSVC,这是...

Different classifier. You may try sklearn.svm.LinearSVC, which is...

[s] 与参数 kernel='linear' 的 SVC 类似,但根据 liblinear 而不是 libsvm 实现,因此它在惩罚和损失函数的选择上具有更大的灵活性,并且应该更好地扩展到大量样本.

[s]imilar to SVC with parameter kernel=’linear’, but implemented in terms of liblinear rather than libsvm, so it has more flexibility in the choice of penalties and loss functions and should scale better to large numbers of samples.

此外,一位 scikit-learn 开发人员建议使用 kernel_approximation<类似问题中的/a> 模块.

Moreover, a scikit-learn dev suggested the kernel_approximation module in a similar question.

这篇关于SVC 分类器花费太多时间进行训练的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆