python - 使用具有大csv的pandas结构（iterate和chunksize） [英] python - Using pandas structures with large csv(iterate and chunksize)

查看：4754 发布时间：2017/2/24 18:10:12 python csv pandas dataframe bigdata

本文介绍了python - 使用具有大csv的pandas结构（iterate和chunksize）的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一个大的csv文件，大约600mb与1100万行，我想创建统计数据，如枢轴，直方图，图形等。显然试图只是为了正常阅读：

I have a large csv file, about 600mb with 11 million rows and I want to create statistical data like pivots, histograms, graphs etc. Obviously trying to just to read it normally:

df = pd.read_csv('Check400_900.csv', sep='\t')

不工作，所以我发现iterate和chunksize在一个类似的职位，所以我使用

doesn't work so I found iterate and chunksize in a similar post so I used

df = pd.read_csv('Check1_900.csv', sep='\t', iterator=True, chunksize=1000)

所有好的，我可以例如打印df.get_chunk（5），并使用

All good, i can for example print df.get_chunk(5) and search the whole file with just

for chunk in df:
    print chunk

我的问题是我不知道如何使用像下面的东西整个df，而不是只有一个块

My problem is I don't know how to use stuff like these below for the whole df and not for just one chunk

plt.plot()
print df.head()
print df.describe()
print df.dtypes
customer_group3 = df.groupby('UserID')
y3 = customer_group.size()

我希望我的问题是不要那么混乱

I hope my question is not so confusing

推荐答案

我认为你需要 concat chunks to df，因为函数的输出类型：

I think you need concat chunks to df, because type of output of function:

df = pd.read_csv('Check1_900.csv', sep='\t', iterator=True, chunksize=1000)

不是数据框架，但 pandas.io.parsers.TextFileReader - 来源。

isn't dataframe, but pandas.io.parsers.TextFileReader - source.

tp = pd.read_csv('Check1_900.csv', sep='\t', iterator=True, chunksize=1000)
print tp
#<pandas.io.parsers.TextFileReader object at 0x00000000150E0048>
df = pd.concat(tp, ignore_index=True)

参数忽略索引 to function concat ，因为避免了索引的重复。

I think is necessary add parameter ignore index to function concat, because avoiding duplicity of indexes.

这篇关于python - 使用具有大csv的pandas结构（iterate和chunksize）的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

python - 使用具有大csv的pandas结构（iterate和chunksize） [英] python - Using pandas structures with large csv(iterate and chunksize)

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

python - 使用具有大csv的pandas结构（iterate和chunksize） [英] python - Using pandas structures with large csv(iterate and chunksize)

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭