如何使用Python读取CSV文件的标题列? [英] How can I read only the header column of a CSV file using Python?

查看:1107
本文介绍了如何使用Python读取CSV文件的标题列?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在寻找一种仅读取大量大型CSV文件的标题行的方法.

I am looking for a a way to read just the header row of a large number of large CSV files.

使用Pandas,每个csv文件都可以使用此方法:

Using Pandas, I have this method available, for each csv file:

>>> df = pd.read_csv(PATH_TO_CSV)
>>> df.columns

我可以仅使用csv模块来做到这一点:

I could do this with just the csv module:

>>> reader = csv.DictReader(open(PATH_TO_CSV))
>>> reader.fieldnames

这些问题是每个CSV文件的大小都超过500MB,并且读取每个文件的整个文件只是拉标题行似乎是巨大的浪费.

The problem with these is that each CSV file is 500MB+ in size, and it seems to be a gigantic waste to read in the entire file of each just to pull the header lines.

我所有这些的最终目标是提取唯一的列名.一旦有了这些文件中每个文件的列标题列表,就可以执行此操作.

My end goal of all of this is to pull out unique column names. I can do that once I have a list of column headers that are in each of these files.

如何快速快速地仅提取CSV文件的标题行?

How can I extract only the header row of a CSV file, quickly?

推荐答案

我以iglob为例搜索.csv文件,但是一种方法是使用一组,然后根据需要进行调整,例如:

I've used iglob as an example to search for the .csv files, but one way is to use a set, then adjust as necessary, eg:

import csv
from glob import iglob

unique_headers = set()
for filename in iglob('*.csv'):
    with open(filename, 'rb') as fin:
        csvin = csv.reader(fin)
        unique_headers.update(next(csvin, []))

这篇关于如何使用Python读取CSV文件的标题列?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆