如何减少 Scikit-Learn Vectorizers 的内存使用量? [英] How can i reduce memory usage of Scikit-Learn Vectorizers?

查看：21 发布时间：2021/12/25 14:45:01 python numpy machine-learning scipy scikit-learn

本文介绍了如何减少 Scikit-Learn Vectorizers 的内存使用量?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

TFIDFVectorizer 占用太多内存，矢量化 470 MB 的 100k 文档需要超过 6 GB，如果我们处理 2100 万个文档，它将无法容纳我们拥有的 60 GB RAM.

TFIDFVectorizer takes so much memory ,vectorizing 470 MB of 100k documents takes over 6 GB , if we go 21 million documents it will not fit 60 GB of RAM we have.

所以我们选择 HashingVectorizer 但仍然需要知道如何分发哈希向量化器.Fit 和 partial fit 没有任何作用，那么如何使用 Huge Corpus?

So we go for HashingVectorizer but still need to know how to distribute the hashing vectorizer.Fit and partial fit does nothing so how to work with Huge Corpus?

如何减少 Scikit-Learn Vectorizers 的内存使用量? [英] How can i reduce memory usage of Scikit-Learn Vectorizers?

问题描述

推荐答案

相关文章

AI人工智能最新文章

热门教程

热门工具

登录关闭

如何减少 Scikit-Learn Vectorizers 的内存使用量? [英] How can i reduce memory usage of Scikit-Learn Vectorizers?

问题描述

推荐答案

相关文章

AI人工智能最新文章

热门教程

热门工具

登录 关闭

登录关闭