加载到pandas数据框之前，先从CSV中过滤出行 [英] Filter out rows from CSV before loading to pandas dataframe

查看：49 发布时间：2020/7/11 23:42:43 python python-2.7 csv pandas

本文介绍了加载到pandas数据框之前，先从CSV中过滤出行的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一个很大的csv文件，由于内存问题，我无法使用read_csv()将其加载到DataFrame中.但是，在csv的第一列中有一个{0,1}标志，我只需要用'1'加载行，这很容易小到足以放入DataFrame中.有什么方法可以用条件加载数据，或在加载前操纵csv(类似于grep)?

I have a large csv file, that I cannot load into a DataFrame using read_csv() due to memory issues. However in the first column of the csv there is a {0,1} flag, and I only need to load the rows with a '1', which will easily be small enough to fit in a DataFrame. Is there any way to load the data with a condition, or to manipulate the csv prior to loading it (similar to grep)?

推荐答案

您可以使用 pd.read_csv s comment参数并将其设置为'0'

You can use pd.read_csvs the comment parameter and set it to '0'

import pandas as pd
from io import StringIO

txt = """col1,col2
1,a
0,b
1,c
0,d"""

pd.read_csv(StringIO(txt), comment='0')

   col1 col2
0     1    a
1     1    c

您还可以使用chunksize将pd.read_csv变成迭代器，并使用query和pd.concat
对其进行处理. 注意: 正如OP所指出的，1的块大小并不现实.我仅将其用于演示目的.请根据个人需要增加它.

You can also use chunksize to turn pd.read_csv into an iterator and process it with query and pd.concat
NOTE: As the OP pointed out, chunk size of 1 isn't realistic. I used it for demonstration purposes only. Please increase it to suit individual needs.

pd.concat([df.query('col1 == 1') for df in pd.read_csv(StringIO(txt), chunksize=1)])
# Equivalent to and slower than... use the commented line for better performance
# pd.concat([df[df.col1 == 1] for df in pd.read_csv(StringIO(txt), chunksize=1)])

   col1 col2
0     1    a
2     1    c

这篇关于加载到pandas数据框之前，先从CSV中过滤出行的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

加载到pandas数据框之前，先从CSV中过滤出行 [英] Filter out rows from CSV before loading to pandas dataframe

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

加载到pandas数据框之前，先从CSV中过滤出行 [英] Filter out rows from CSV before loading to pandas dataframe

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭