pandas read_csv,不知道头是否存在 [英] Pandas read_csv without knowing whether header is present

查看:85
本文介绍了 pandas read_csv,不知道头是否存在的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个带有已知列的输入文件,假设有两个列NameSex.有时它具有标题行Name,Sex,有时却没有:

I have an input file with known columns, let's say two columns Name and Sex. Sometimes it has the header line Name,Sex, and sometimes it doesn't:

1.csv :

Name,Sex
John,M
Leslie,F

2.csv :

John,M
Leslie,F

事先知道列的标识,有没有一种好的方法可以使用相同的read_csv命令来处理这两种情况?基本上,我想指定names=['Name', 'Sex'],然后仅在标题存在时才让它推断header=0.我能想到的最好的是:

Knowing the identity of the columns beforehand, is there a nice way to handle both cases with the same read_csv command? Basically, I want to specify names=['Name', 'Sex'] and then have it infer header=0 only when the header is there. Best I can come up with is:

  • 1)在执行read_csv之前读取文件的第一行,并进行设置 参数.

  • 1) Read the first line of the file before doing read_csv, and set parameters appropriately.

2)只需执行df = pd.read_csv(input_file, names=['Name', 'Sex']), 然后检查zeroeth行是否与标题相同,以及 因此将其删除(然后可能必须对行进行重新编号).

2) Just do df = pd.read_csv(input_file, names=['Name', 'Sex']), then check whether the zeroeth row is identical to the header, and if so drop it (and then maybe have to renumber the rows).

但是,对于我来说,这似乎并不罕见.有没有想到的read_csv内置方法?

But this doesn't seem like that unusual of a use case to me. Is there a built-in way of doing this with read_csv that I haven't thought of?

推荐答案

使用新功能-使用 .query()方法:

df = (pd.read_csv(filename, header=None, names=cols)
        .query('Name != "Name" and Sex != "Sex"'))

我不确定这是否是最优雅的方法,但这也应该起作用:

i'm not sure that this is the most elegant way, but this should work as well:

df = pd.read_csv(filename, header=None, names=cols)

if (df.iloc[0] == cols).all():
    df = df[1:].reset_index(drop=True)

这篇关于 pandas read_csv,不知道头是否存在的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆