为什么Pandas数据帧消耗的RAM比原始文本文件大得多? [英] Why does a pandas dataframe consumes much more RAM than the size of the original text file?

查看：62 发布时间：2020/5/24 1:07:48 python pandas

本文介绍了为什么Pandas数据帧消耗的RAM比原始文本文件大得多?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在尝试使用pandas pd.read_csv("file.txt",sep="\t")将较大的tab/txt(大小= 3 gb)文件导入Python.我加载的文件是一个".tab"文件，我将其扩展名更改为".txt"，以使用read_csv()导入该文件.这是一个具有305列和+/- 1000000行的文件.

I'm trying to import a large tab/txt (size = 3 gb) file into Python using pandas pd.read_csv("file.txt",sep="\t"). The file I load was a ".tab" file of which I changed the extension to ".txt" to import it with read_csv(). It is a file with 305 columns and +/- 1 000 000 rows.

当我执行代码时，一段时间后Python返回MemoryError.我搜索了一些信息，这基本上意味着没有足够的RAM.当我在read_csv()中指定nrows = 20时，效果很好.

When I execute the code, after some time Python returns a MemoryError. I searched for some information and this basically means that there is not enough RAM available. When I specify nrows = 20 in read_csv() it works fine.

我正在使用的计算机具有46gb的RAM，其中大约20gb可用于Python.

The computer I'm using has 46gb of RAM of which roughly 20 gb was available for Python.

我的问题:3gb的文件如何可能需要使用熊猫read_csv()将超过20gb的RAM导入Python?我做错什么了吗?

My question: How is it possible that a file of 3gb needs more than 20gb of RAM to be imported into Python using pandas read_csv()? Am I doing anything wrong?

编辑:执行df.dtypes时，类型是object，float64和int64

When executing df.dtypes the types are a mix of object, float64, and int64

更新:我使用以下代码克服了该问题并进行了计算:

UPDATE: I used the following code to overcome the problem and perform my calculations:

summed_cols=pd.DataFrame(columns=["sample","read sum"])
while x<352:
    x=x+1
    sample_col=pd.read_csv("file.txt",sep="\t",usecols=[x])
    summed_cols=summed_cols.append(pd.DataFrame({"sample":[sample_col.columns[0]],"read sum":sum(sample_col[sample_col.columns[0]])}))
    del sample_col

现在它选择一列，执行计算，将结果存储在数据框中，删除当前列，然后移至下一列

it now selects a column, performs a calculation, stores the result in a dataframe, deletes the current column, and moves to the next column

为什么Pandas数据帧消耗的RAM比原始文本文件大得多? [英] Why does a pandas dataframe consumes much more RAM than the size of the original text file?

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

为什么Pandas数据帧消耗的RAM比原始文本文件大得多? [英] Why does a pandas dataframe consumes much more RAM than the size of the original text file?

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭