Python Pandas df.info [英] Python Pandas df.info
问题描述
Total files to Process : 100
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1713078 entries, 0 to 1713077
Columns: 322 entries, #RIC to Reuters Classification Scheme.1
dtypes: object(322)
memory usage: 17.1 GB
None
我从 100 个 csv 文件和以上创建了一个数据帧,你有 df.info(memory_usage='deep')
用于它.它显示 17.1 GB
.究竟是什么意思?我的 mac 只有 16 GB RAM....我如何处理它?以及可以增加多少……比如上限是多少.
I created a dataframe from 100 csv files and above you have df.info(memory_usage='deep')
for that.
It shows 17.1 GB
.
What exactly does it mean?
My mac has only 16 GB RAM....how am i able to process it ?
And how much can that increase upto....like what would be the upper limit for that.
推荐答案
pandas 允许处理非常大的 csv 文件,即使这些文件不适合内存,一种方法是分块读取:
pandas allow to work with very large csv files, even if those doesn't fit in memory, one way to do it is to read it by chunks :
reader = pd.read_csv(csv_filename, iterator=True, chunksize=1000)
其中 chunksize 是要处理的行数.
where chunksize is the number of rows to process.
然后您可以迭代返回的 TextParser 对象,例如:
You can then iterate on the TextParser object returned, like :
for df in reader:
# process each data frame
your_processing(df)
根据您的处理,您甚至可以使用多处理来加快处理速度.
Depending on your processing, you even even use multiprocessing to speed up things.
这篇关于Python Pandas df.info的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!