Python Pandas df.info [英] Python Pandas df.info

查看:78
本文介绍了Python Pandas df.info的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

Total files to Process :  100
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1713078 entries, 0 to 1713077
Columns: 322 entries, #RIC to Reuters Classification Scheme.1
dtypes: object(322)
memory usage: 17.1 GB
None

我从 100 个 csv 文件和以上创建了一个数据帧,你有 df.info(memory_usage='deep') 用于它.它显示 17.1 GB.究竟是什么意思?我的 mac 只有 16 GB RAM....我如何处理它?以及可以增加多少……比如上限是多少.

I created a dataframe from 100 csv files and above you have df.info(memory_usage='deep') for that. It shows 17.1 GB. What exactly does it mean? My mac has only 16 GB RAM....how am i able to process it ? And how much can that increase upto....like what would be the upper limit for that.

推荐答案

pandas 允许处理非常大的 csv 文件,即使这些文件不适合内存,一种方法是分块读取:

pandas allow to work with very large csv files, even if those doesn't fit in memory, one way to do it is to read it by chunks :

reader = pd.read_csv(csv_filename, iterator=True, chunksize=1000)

其中 chunksize 是要处理的行数.

where chunksize is the number of rows to process.

然后您可以迭代返回的 TextParser 对象,例如:

You can then iterate on the TextParser object returned, like :

for df in reader:
# process each data frame
    your_processing(df)

根据您的处理,您甚至可以使用多处理来加快处理速度.

Depending on your processing, you even even use multiprocessing to speed up things.

这篇关于Python Pandas df.info的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆