多处理 scikit-learn [英] Multiprocessing scikit-learn

查看：52 发布时间：2021/12/14 10:14:17 python multithreading numpy machine-learning scikit-learn

本文介绍了多处理 scikit-learn的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我使用 load_file 方法让 linearsvc 针对训练集和测试集工作，我试图让它在多处理器环境中工作.

I got linearsvc working against training set and test set using load_file method i am trying to get It working on Multiprocessor enviorment.

如何在 LinearSVC().fit() LinearSVC().predict() 上进行多处理工作?我还不太熟悉 scikit-learn 的数据类型.

How can i get multiprocessing work on LinearSVC().fit() LinearSVC().predict()? I am not really familiar with datatypes of scikit-learn yet.

我也在考虑将样本拆分为多个数组，但我不熟悉 numpy 数组和 scikit-learn 数据结构.

I am also thinking about splitting samples into multiple arrays but i am not familiar with numpy arrays and scikit-learn data structures.

这样做会更容易放入 multiprocessing.pool() ，这样，将样本分成块，训练它们并稍后结合训练集，它会起作用吗?

Doing this it will be easier to put into multiprocessing.pool() , with that , split samples into chunks , train them and combine trained set back later , would it work ?

这是我的场景:

假设，我们在训练样本集中有 100 万个文件，当我们想在多个处理器上分配 Tfidfvectorizer 的处理时，我们必须拆分这些样本(就我而言，它只有两个类别，所以假设每个样本有 500000 个训练) .我的服务器有 24 个内核和 48 GB，所以我想将每个主题分成 1000000/24 个块并在它们上处理 Tfidfvectorizer.就像我会做的测试样本集，以及 SVC.fit() 和决定().是否有意义?

lets say , we have 1 million files in training sample set , when we want to distribute processing of Tfidfvectorizer on several processors we have to split those samples (for my case it will only have two categories , so lets say 500000 each samples to train) . My server have 24 cores with 48 GB , so i want to split each topics into number of chunks 1000000 / 24 and process Tfidfvectorizer on them. Like that i would do to Testing sample set , as well as SVC.fit() and decide(). Does it make sense?

谢谢.

PS:请不要关闭这个.

PS: Please do not close this .

多处理 scikit-learn [英] Multiprocessing scikit-learn

问题描述

推荐答案

相关文章

AI人工智能最新文章

热门教程

热门工具

登录关闭

多处理 scikit-learn [英] Multiprocessing scikit-learn

问题描述

推荐答案

相关文章

AI人工智能最新文章

热门教程

热门工具

登录 关闭

登录关闭