为什么 scikit-learn 邻居在 n_jobs > 时变慢?1和forkserver [英] Why scikit-learn neighbors is slower with n_jobs > 1 and forkserver

查看：74 发布时间：2021/7/16 20:08:29 python scikit-learn python-multiprocessing

本文介绍了为什么 scikit-learn 邻居在 n_jobs > 时变慢?1和forkserver的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在使用 scikit-learn 进行元启发式练习，但我有一个疑问:我需要使用 knn，所以我有一个带有 n_jobs=-1 的 KNearestNeighbors 对象.正如文档所说，我必须将多处理模式设置为 forkserver.但是 n_jobs=-1 时的 knn 比 n_jobs=1 时慢得多.

I'm using scikit-learn for doing Metaheuristics exercises and I have a doubt: I need to use knn, so I have a KNearestNeighbors object with n_jobs=-1. As the docs said, I have to set the multiprocessing mode to forkserver. But the knn is soooo slower with n_jobs=-1 than with n_jobs=1.

这是一段代码

### Some initialization here ###
skf = StratifiedKFold(target, n_folds=2, shuffle=True)

for train_index, test_index in skf:
       data_train, data_test = data[train_index], data[test_index]
       target_train, target_test = target[train_index], target[test_index]

       start = time()
       selected_features, score = SFS(data_train, data_test, target_train, target_test, knn)
       end = time()

       logger.info("SFS - Time elapsed: " + str(end-start) + ". Score: " + str(score) + ". Selected features: " + str(sum(selected_features)))
if __name__ == "__main__":
    import multiprocessing as mp; mp.set_start_method('forkserver', force = True)
    main()

这是SFS函数

def SFS(data_train, data_test, target_train, target_test, classifier):
    rowsize = len(data_train[0])
    selected_features = np.zeros(rowsize, dtype=np.bool)
    best_score = 0
    best_feature = 0

    while best_feature is not None:
        end = True
        best_feature = None

        for idx in range(rowsize):
            if selected_features[idx]:
                continue

            selected_features[idx] = True
            classifier.fit(data_train[:,selected_features], target_train)
            score = classifier.score(data_test[:,selected_features], target_test)
            selected_features[idx] = False

            if score > best_score:
                best_score = score
                best_feature = idx

        if best_feature is not None:
            selected_features[best_feature] = True

    return selected_features, best_score

我不明白 n_jobs > 1 怎么会比 n_jobs = 1 慢.谁能解释一下?我试过 3 个数据集.

I don't understand how can n_jobs > 1 be slower than n_jobs = 1. Can anyone explain me that? I've tried with 3 dataset.

为什么 scikit-learn 邻居在 n_jobs > 时变慢?1和forkserver [英] Why scikit-learn neighbors is slower with n_jobs > 1 and forkserver

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

为什么 scikit-learn 邻居在 n_jobs > 时变慢?1和forkserver [英] Why scikit-learn neighbors is slower with n_jobs &gt; 1 and forkserver

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

为什么 scikit-learn 邻居在 n_jobs > 时变慢?1和forkserver [英] Why scikit-learn neighbors is slower with n_jobs > 1 and forkserver

登录关闭