防止 pandas read_csv将第一行视为列名的标题 [英] Prevent pandas read_csv treating first row as header of column names
问题描述
我正在使用pd.read_csv
读取pandas DataFrame
.我想保留第一行作为数据,但是它一直在转换为列名.
I'm reading in a pandas DataFrame
using pd.read_csv
. I want to keep the first row as data, however it keeps getting converted to column names.
- 我尝试了
header=False
,但这只是将其完全删除了.
- I tried
header=False
but this just deleted it entirely.
(关于输入数据的注意:我有一个字符串(st = '\n'.join(lst)
),我将其转换为类似文件的对象(io.StringIO(st)
),然后从该文件对象构建csv
.)
(Note on my input data: I have a string (st = '\n'.join(lst)
) that I convert to a file-like object (io.StringIO(st)
), then build the csv
from that file object.)
推荐答案
您希望header=None
将False
的类型提升为int
到0
中,请参见
You want header=None
the False
gets type promoted to int
into 0
see the docs emphasis mine:
header:整数或整数列表,默认的推断"行号用作 列名以及数据的开头.默认行为就像 如果未传递任何名称,则设置为0,否则为无.显式传递标头= 0 才能替换现有名称.标头可以是以下内容的列表 指定列上多索引的行位置的整数 例如[0,1,3].未指定的中间行将被跳过 (例如,在此示例中为2).请注意,此参数将忽略 如果skip_blank_lines = True,则注释行和空行,因此header = 0 表示数据的第一行,而不是文件的第一行.
header : int or list of ints, default ‘infer’ Row number(s) to use as the column names, and the start of the data. Default behavior is as if set to 0 if no names passed, otherwise None. Explicitly pass header=0 to be able to replace existing names. The header can be a list of integers that specify row locations for a multi-index on the columns e.g. [0,1,3]. Intervening rows that are not specified will be skipped (e.g. 2 in this example is skipped). Note that this parameter ignores commented lines and empty lines if skip_blank_lines=True, so header=0 denotes the first line of data rather than the first line of the file.
首先可以通过header=0
看到行为上的差异:
You can see the difference in behaviour, first with header=0
:
In [95]:
import io
import pandas as pd
t="""a,b,c
0,1,2
3,4,5"""
pd.read_csv(io.StringIO(t), header=0)
Out[95]:
a b c
0 0 1 2
1 3 4 5
现在有None
:
In [96]:
pd.read_csv(io.StringIO(t), header=None)
Out[96]:
0 1 2
0 a b c
1 0 1 2
2 3 4 5
请注意,在最新版本0.19.1
中,这将引发一个TypeError
:
Note that in latest version 0.19.1
, this will now raise a TypeError
:
In [98]:
pd.read_csv(io.StringIO(t), header=False)
TypeError:将布尔值传递给标头无效.使用header = None表示否 标头或标头= int或类似列表的int以指定行 列名称
TypeError: Passing a bool to header is invalid. Use header=None for no header or header=int or list-like of ints to specify the row(s) making up the column names
这篇关于防止 pandas read_csv将第一行视为列名的标题的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!