Python Pandas read_csv 跳过行但保留标题 [英] Python Pandas read_csv skip rows but keep header

查看:149
本文介绍了Python Pandas read_csv 跳过行但保留标题的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我无法弄清楚如何跳过 csv 文件中的 n 行但保留第 1 行的标题.

我想做的是迭代但保留第一行的标题.skiprows 使标题成为跳过的行之后的第一行.这样做的最佳方法是什么?

data = pd.read_csv('test.csv', sep='|', header=0, skiprows=10, nrows=10)

解决方案

您可以将行号列表而不是整数传递给 skiprows.

通过为函数提供整数 10,您只是跳过了前 10 行.

要保留第一行 0(作为标题),然后跳过其他所有行直到第 10 行,您可以这样写:

pd.read_csv('test.csv', sep='|', skiprows=range(1, 10))

<小时>

使用read_csv

跳过行的其他方法

控制 read_csv 使用哪些行的两种主要方法是 headerskiprows 参数.

假设我们有以下包含一列的 CSV 文件:

a乙Cd电子F

在下面的每个例子中,这个文件是f = io.StringIO(" ".join("abcdef")).

  • 读取所有行作为值(无标题,默认为整数)

    <预><代码>>>>pd.read_csv(f, header=None)00个1个2 厘米3天4 电子5 英尺

  • 使用特定行作为标题(跳过之前的所有行):

    <预><代码>>>>pd.read_csv(f, header=3)d0 e1 英尺

  • 使用多行作为创建 MultiIndex 的标题(跳过最后指定标题行之前的所有行):

    <预><代码>>>>pd.read_csv(f, header=[2, 4])C电子0 f

  • 从文件开头跳过 N 行(没有跳过的第一行是标题):

    <预><代码>>>>pd.read_csv(f,skiprows=3)d0 e1 英尺

  • 通过给出行索引跳过一行或多行(没有跳过的第一行是标题):

    <预><代码>>>>pd.read_csv(f, skiprows=[2, 4])一种0个1天2 英尺

I'm having trouble figuring out how to skip n rows in a csv file but keep the header which is the 1 row.

What I want to do is iterate but keep the header from the first row. skiprows makes the header the first row after the skipped rows. What is the best way of doing this?

data = pd.read_csv('test.csv', sep='|', header=0, skiprows=10, nrows=10)

解决方案

You can pass a list of row numbers to skiprows instead of an integer.

By giving the function the integer 10, you're just skipping the first 10 lines.

To keep the first row 0 (as the header) and then skip everything else up to row 10, you can write:

pd.read_csv('test.csv', sep='|', skiprows=range(1, 10))


Other ways to skip rows using read_csv

The two main ways to control which rows read_csv uses are the header or skiprows parameters.

Supose we have the following CSV file with one column:

a
b
c
d
e
f

In each of the examples below, this file is f = io.StringIO(" ".join("abcdef")).

  • Read all lines as values (no header, defaults to integers)

    >>> pd.read_csv(f, header=None)
       0
    0  a
    1  b
    2  c
    3  d
    4  e
    5  f
    

  • Use a particular row as the header (skip all lines before that):

    >>> pd.read_csv(f, header=3)
       d
    0  e
    1  f
    

  • Use a multiple rows as the header creating a MultiIndex (skip all lines before the last specified header line):

    >>> pd.read_csv(f, header=[2, 4])                                                                                                                                                                        
       c
       e
    0  f
    

  • Skip N rows from the start of the file (the first row that's not skipped is the header):

    >>> pd.read_csv(f, skiprows=3)                                                                                                                                                                      
       d
    0  e
    1  f
    

  • Skip one or more rows by giving the row indices (the first row that's not skipped is the header):

    >>> pd.read_csv(f, skiprows=[2, 4])                                                                                                                                                                      
       a
    0  b
    1  d
    2  f
    

这篇关于Python Pandas read_csv 跳过行但保留标题的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆