pandas read_csv,不知道头是否存在 [英] Pandas read_csv without knowing whether header is present
问题描述
我有一个带有已知列的输入文件,假设有两个列Name
和Sex
.有时它具有标题行Name,Sex
,有时却没有:
I have an input file with known columns, let's say two columns Name
and Sex
. Sometimes it has the header line Name,Sex
, and sometimes it doesn't:
1.csv :
Name,Sex
John,M
Leslie,F
2.csv :
John,M
Leslie,F
事先知道列的标识,有没有一种好的方法可以使用相同的read_csv
命令来处理这两种情况?基本上,我想指定names=['Name', 'Sex']
,然后仅在标题存在时才让它推断header=0
.我能想到的最好的是:
Knowing the identity of the columns beforehand, is there a nice way to handle both cases with the same read_csv
command? Basically, I want to specify names=['Name', 'Sex']
and then have it infer header=0
only when the header is there. Best I can come up with is:
-
1)在执行
read_csv
之前读取文件的第一行,并进行设置 参数.
1) Read the first line of the file before doing
read_csv
, and set parameters appropriately.
2)只需执行df = pd.read_csv(input_file, names=['Name', 'Sex'])
,
然后检查zeroeth行是否与标题相同,以及
因此将其删除(然后可能必须对行进行重新编号).
2) Just do df = pd.read_csv(input_file, names=['Name', 'Sex'])
,
then check whether the zeroeth row is identical to the header, and if
so drop it (and then maybe have to renumber the rows).
但是,对于我来说,这似乎并不罕见.有没有想到的read_csv
内置方法?
But this doesn't seem like that unusual of a use case to me. Is there a built-in way of doing this with read_csv
that I haven't thought of?
推荐答案
df = (pd.read_csv(filename, header=None, names=cols)
.query('Name != "Name" and Sex != "Sex"'))
我不确定这是否是最优雅的方法,但这也应该起作用:
i'm not sure that this is the most elegant way, but this should work as well:
df = pd.read_csv(filename, header=None, names=cols)
if (df.iloc[0] == cols).all():
df = df[1:].reset_index(drop=True)
这篇关于 pandas read_csv,不知道头是否存在的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!