如何减少Scikit-Learn矢量化器的内存使用量? [英] How can i reduce memory usage of Scikit-Learn Vectorizers?

查看：76 发布时间：2020/5/4 9:03:15 python numpy machine-learning scipy scikit-learn

本文介绍了如何减少Scikit-Learn矢量化器的内存使用量?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

TFIDFVectorizer占用大量内存，将470 MB的10万个文档向量化将占用6 GB的空间，如果我们处理2100万个文档，将无法容纳60 GB的RAM.

TFIDFVectorizer takes so much memory ,vectorizing 470 MB of 100k documents takes over 6 GB , if we go 21 million documents it will not fit 60 GB of RAM we have.

所以我们选择了HashingVectorizer，但仍然需要知道如何分发哈希矢量化器.Fit和Partial Fit什么都没做，所以如何使用Huge Corpus?

So we go for HashingVectorizer but still need to know how to distribute the hashing vectorizer.Fit and partial fit does nothing so how to work with Huge Corpus?

如何减少Scikit-Learn矢量化器的内存使用量? [英] How can i reduce memory usage of Scikit-Learn Vectorizers?

问题描述

推荐答案

相关文章

AI人工智能最新文章

热门教程

热门工具

登录关闭

如何减少Scikit-Learn矢量化器的内存使用量? [英] How can i reduce memory usage of Scikit-Learn Vectorizers?

问题描述

推荐答案

相关文章

AI人工智能最新文章

热门教程

热门工具

登录 关闭

登录关闭