SVM sklearn上的随机种子产生不同的结果 [英] Random seed on SVM sklearn produces different results

查看:525
本文介绍了SVM sklearn上的随机种子产生不同的结果的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

运行SVM时,即使使用固定的random_state=42,也会得到不同的结果.

我有10个类别和200个示例的数据集.我的数据集dim_dataset=(200,2048)

的维度

这是我的代码:

import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.svm import LinearSVC
from sklearn import svm
import random
random.seed(42)

def shuffle_data(x,y):
    idx = np.random.permutation(len(x))
    x_data= x[idx]
    y_labels=y[idx]
    return x_data,y_labels

d,l=shuffle_data(dataset,true_labels) # dim_d=(200,2048) , dim_l=(200,)

X_train, X_test, y_train, y_test = train_test_split(d, l, test_size=0.30, random_state=42)

# hist intersection kernel
gramMatrix = histogramIntersection(X_train, X_train)
clf_gram = svm.SVC(kernel='precomputed', random_state=42).fit(gramMatrix, y_train)
predictMatrix = histogramIntersection(X_test, X_train)
SVMResults = clf_gram.predict(predictMatrix)
correct = sum(1.0 * (SVMResults == y_test))
accuracy = correct / len(y_test)
print("SVM (Histogram Intersection): " + str(accuracy) + " (" + str(int(correct)) + "/" + str(len(y_test)) + ")")


# libsvm linear kernel
clf_linear_kernel = svm.SVC(kernel='linear', random_state=42).fit(X_train, y_train)
predicted_linear = clf_linear_kernel.predict(X_test)
correct_linear_libsvm = sum(1.0 * (predicted_linear == y_test))
accuracy_linear_libsvm = correct_linear_libsvm / len(y_test)
print("SVM (linear kernel libsvm): " + str(accuracy_linear_libsvm) + " (" + str(int(correct_linear_libsvm)) + "/" + str(len(y_test)) + ")")

# liblinear linear kernel

clf_linear_kernel_liblinear = LinearSVC(random_state=42).fit(X_train, y_train)
predicted_linear_liblinear = clf_linear_kernel_liblinear.predict(X_test)
correct_linear_liblinear = sum(1.0 * (predicted_linear_liblinear == y_test))
accuracy_linear_liblinear = correct_linear_liblinear / len(y_test)
print("SVM (linear kernel liblinear): " + str(accuracy_linear_liblinear) + " (" + str(
        int(correct_linear_liblinear)) + "/" + str(len(y_test)) + ")")

我的代码有什么问题?

解决方案

使用应该我使用`random.seed`或`numpy.random.seed`来控制`scikit-learn`中的随机数生成?

  • http://scikit-learn.org/stable/developers /utilities.html#validation-tools
  • when l run SVM, l get different results even with a fixed random_state=42.

    l have 10 classes and a dataset of 200 examples. Dimension of my dataset dim_dataset=(200,2048)

    Here is my code:

    import numpy as np
    from sklearn.model_selection import train_test_split
    from sklearn.svm import LinearSVC
    from sklearn import svm
    import random
    random.seed(42)
    
    def shuffle_data(x,y):
        idx = np.random.permutation(len(x))
        x_data= x[idx]
        y_labels=y[idx]
        return x_data,y_labels
    
    d,l=shuffle_data(dataset,true_labels) # dim_d=(200,2048) , dim_l=(200,)
    
    X_train, X_test, y_train, y_test = train_test_split(d, l, test_size=0.30, random_state=42)
    
    # hist intersection kernel
    gramMatrix = histogramIntersection(X_train, X_train)
    clf_gram = svm.SVC(kernel='precomputed', random_state=42).fit(gramMatrix, y_train)
    predictMatrix = histogramIntersection(X_test, X_train)
    SVMResults = clf_gram.predict(predictMatrix)
    correct = sum(1.0 * (SVMResults == y_test))
    accuracy = correct / len(y_test)
    print("SVM (Histogram Intersection): " + str(accuracy) + " (" + str(int(correct)) + "/" + str(len(y_test)) + ")")
    
    
    # libsvm linear kernel
    clf_linear_kernel = svm.SVC(kernel='linear', random_state=42).fit(X_train, y_train)
    predicted_linear = clf_linear_kernel.predict(X_test)
    correct_linear_libsvm = sum(1.0 * (predicted_linear == y_test))
    accuracy_linear_libsvm = correct_linear_libsvm / len(y_test)
    print("SVM (linear kernel libsvm): " + str(accuracy_linear_libsvm) + " (" + str(int(correct_linear_libsvm)) + "/" + str(len(y_test)) + ")")
    
    # liblinear linear kernel
    
    clf_linear_kernel_liblinear = LinearSVC(random_state=42).fit(X_train, y_train)
    predicted_linear_liblinear = clf_linear_kernel_liblinear.predict(X_test)
    correct_linear_liblinear = sum(1.0 * (predicted_linear_liblinear == y_test))
    accuracy_linear_liblinear = correct_linear_liblinear / len(y_test)
    print("SVM (linear kernel liblinear): " + str(accuracy_linear_liblinear) + " (" + str(
            int(correct_linear_liblinear)) + "/" + str(len(y_test)) + ")")
    

    What's wrong with my code ?

    解决方案

    Use numpy.random.seed() instead of simple random.seed like this:

    np.random.seed(42)
    

    Scikit internally uses numpy to generate random numbers so doing only random.seed will not effect the behaviour of numpy which is still random.

    Please see the following links for better understanding:

    这篇关于SVM sklearn上的随机种子产生不同的结果的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

    查看全文
    登录 关闭
    扫码关注1秒登录
    发送“验证码”获取 | 15天全站免登陆