pandas 数据帧内存python [英] pandas dataframe memory python

查看：143 发布时间：2017/3/25 23:41:24 python pandas memory dataframe scikit-learn

本文介绍了 pandas 数据帧内存python的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我想将一个稀疏矩阵（156060x11780）转换为数据帧，但是我收到内存错误这是我的代码

i want to transform a sparse matrix (156060x11780) to dataframe but i get a memory error this is my code

vect = TfidfVectorizer(sublinear_tf=True, analyzer='word', 
                       stop_words='english' , tokenizer=tokenize,
                       strip_accents = 'ascii') 

X = vect.fit_transform(df.pop('Phrase')).toarray()

for i, col in enumerate(vect.get_feature_names()):
    df[col] = X[:, i]

我在中有问题X = vect.fit_transform（df.pop （'Phrase'））toarray（）。如何解决？

推荐答案

尝试这样：

from sklearn.feature_extraction.text import TfidfVectorizer
vect = TfidfVectorizer(sublinear_tf=True, analyzer='word', stop_words='english',
                       tokenizer=tokenize,
                       strip_accents='ascii',dtype=np.float16)
X = vect.fit_transform(df.pop('Phrase'))  # NOTE: `.toarray()` was removed


for i, col in enumerate(vect.get_feature_names()):
    df[col] = pd.SparseSeries(X[:, i].toarray().reshape(-1,), fill_value=0)

这篇关于 pandas 数据帧内存python的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

pandas 数据帧内存python [英] pandas dataframe memory python

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

pandas 数据帧内存python [英] pandas dataframe memory python

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭