读取csv文件的一部分 [英] Reading a part of csv file
问题描述
我有一个非常大的csv文件,大约10GB.每当我尝试使用
I have a really large csv file about 10GB. When ever I try to read in into iPython notebook using
data = pd.read_csv("data.csv")
我的笔记本电脑卡住了.是否可以仅读取10,000行或500 MB的csv文件.
my laptop gets stuck. Is it possible to just read like 10,000 rows or 500 MB of a csv file.
推荐答案
有可能.您可以通过将 iterator = True
和所需的 chunksize
传递给
It is possible. You can create an iterator yielding chunks of your csv of a certain size at a time as a DataFrame by passing iterator=True
with your desired chunksize
to read_csv
.
df_iter = pd.read_csv('data.csv', chunksize=10000, iterator=True)
for iter_num, chunk in enumerate(df_iter, 1):
print(f'Processing iteration {iter_num}')
# do things with chunk
或更简短地
for chunk in pd.read_csv('data.csv', chunksize=10000):
# do things with chunk
或者,如果您只想读取csv的特定部分,则可以使用 skiprows
和 nrows
选项从特定行开始,然后阅读顾名思义, n
行.
Alternatively if there was just a specific part of the csv you wanted to read, you could use the skiprows
and nrows
options to start at a particular line and subsequently read n
rows, as the naming suggests.
这篇关于读取csv文件的一部分的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!