在大量课程上训练分类器时，SVM非常慢 [英] SVM is very slow when training classifier on big number of classes

查看：2080 发布时间：2020/5/17 19:31:27 tensorflow machine-learning keras neural-network svm

本文介绍了在大量课程上训练分类器时，SVM非常慢的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在尝试针对大量项目和类训练SVM分类器，这变得非常非常慢.

I'm trying to train an SVM classifier on big number of items and classes, which becomes really, really slow.

首先，我从数据中提取了一个特征集，以整体上确定为512个特征并将其放入numpy数组中.此数组中有13k项.看起来像这样:

First of all, I've extracted a feature set from my data, to be specific 512 features overall and put it in numpy array. There are 13k items in this array. It looks like that:

>>print(type(X_train))
<class 'numpy.ndarray'>

>>print(X_train)
[[ 0.01988654 -0.02607637  0.04691431 ...  0.11521499  0.03433102
  0.01791015]
[-0.00058317  0.05720023  0.03854145 ...  0.07057668  0.09192026
  0.01479562]
[ 0.01506544  0.05616265  0.01514515 ...  0.04981219  0.05810429
  0.00232013]
...

另外，大约有4000种不同的类别:

Also, there are ~4k of different classes:

>> print(type(labels))
<class 'list'>
>> print(labels)
[0, 0, 0, 1, 1, 1, 1, 1, 2, 2, 2, 3, 3, 4, 4, 4, 4, 5, 5, ... ]

这是分类器:

import pickle
from thundersvmScikit import SVC

FILENAME = 'dataset.pickle'

with open(FILENAME, 'rb') as infile:
    (X_train, labels) = pickle.load(infile)

clf = SVC(kernel='linear', probability=True)
clf.fit(X_train, labels)

大约90个小时过去之后(并且我正在以thundersvm的形式使用sci-learn工具包的GPU实现)拟合操作仍在运行.考虑到在我的情况下这是一个很小的数据集，我当然需要更高效的东西，但是我似乎并没有取得任何成功.例如，我尝试过这种类型的Keras模型:

After ~90 hours has passed (and I'm using GPU implementation of sci-learn kit in a form of thundersvm) fit operation is still running. Taking into account that it is a pretty small dataset in my case I definitely need something more efficient, but I don't seem to have any good success with that. For example, I've tried this type of Keras model:

model = Sequential()
model.add(Dense(input_dim=512, units=100, activation='tanh'))
model.add(Dropout(0.2))
model.add(Dense(units=n_classes, activation='softmax'))
model.compile(loss='categorical_crossentropy', optimizer='adadelta', metrics=['accuracy'])
model.fit(X_train, labels, epochs=500, batch_size=64, validation_split=0.1, shuffle=True)

在训练阶段，我最终获得了很好的准确性:

I end up with pretty good accuracy during the training stage:

Epoch 500/500
11988/11988 [==============================] - 1s 111us/step - loss: 2.1398 - acc: 0.8972 - val_loss: 9.5077 - val_acc: 0.0000e+00

但是，在实际测试中，即使对训练数据集中存在的数据，我的准确性也非常低，基本上可以预测随机类别:

However, during the actual testing even on the data that was present in the training dataset I got extremely low accuracy, predicting basically random classes:

Predictions (best probabilities):
  0  class710015: 0.008
  1  class715573: 0.007
  2  class726619: 0.006
  3  class726619: 0.010
  4  class720439: 0.007
Accuracy: 0.000

能否请您为此指出正确的方向?我应该以某种方式调整SVM方法还是应该针对此类问题切换到自定义Keras模型?如果是，我的模型可能有什么问题?

Could you, please, point me in the right direction with this? Should I adjust SVM approach somehow or should I switch to custom Keras model for this type of a problem? If yes, what is the possible problem with my model?

非常感谢.

在大量课程上训练分类器时，SVM非常慢 [英] SVM is very slow when training classifier on big number of classes

问题描述

推荐答案

相关文章

AI人工智能最新文章

热门教程

热门工具

登录关闭

在大量课程上训练分类器时，SVM非常慢 [英] SVM is very slow when training classifier on big number of classes

问题描述

推荐答案

相关文章

AI人工智能最新文章

热门教程

热门工具

登录 关闭

登录关闭