pandas :如何指定开始行以提取数据 [英] Pandas: how to designate starting row to extract data

查看:39
本文介绍了 pandas :如何指定开始行以提取数据的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用Pandas库和Python.

I am using Pandas library and Python.

我有一个Excel文件,该文件在Excel表格的顶部具有一些标题信息,而我不需要提取数据.

I have an Excel file that has some heading information on the top of an Excel sheet which I do not need for data extraction.

但是,标题信息可能需要更长的行,因此无法预测需要多长时间.

But, the heading information could take longer rows, so it is unpredictable how long it could be.

因此,我的数据提取应从显示"ID"的位置开始... 对于这种特殊情况,它从第5行开始,但是可以更改.

So, my data extraction should start from where it says "ID"... For this particular case, it starts from row 5, but it could change.

图像显示在底部(我在第5行后显示为灰色以获取敏感信息).

The image is shown on the bottom (I grayed out after row 5 for sensitive info).

如何将其逻辑化(跳过标题并跳至第5行)? 模式应为,行标题从"ID,EMP_ID"等开始.

How do I put this in logic (to skip heading and jump to row 5)? The pattern should be, row heading starts from "ID, EMP_ID" etc.

with open('File.xls') as fp:
    skip = next(filter(
        lambda x: x.startswith('ID'),
        enumerate(fp)
    ))[0]

df = pd.read_excel('File.xls', usercols=['ID', 'EMP_ID'], skiprows=skip)
print df

推荐答案

您可以手动检查标题行,然后使用

You could manually check for the header line and then use read_csvs keyword argument skiprows.

with open('data.csv') as fp:
    skip = next(filter(
        lambda x: x[1].startswith('ID'),
        enumerate(fp)
    ))[0]

然后跳过行:

df = pandas.read_csv('data.csv', skiprows=skip)

就像您可以支持任意长度的标头部分一样.

Like that you can support pre-header sections of arbitrary length.

对于Python 2:

For Python 2:

import itertools as it

with open('data.csv') as fp:
    skip = next(it.ifilter(
        lambda x: x[1].startswith('ID'),
        enumerate(fp)
    ))[0]

这篇关于 pandas :如何指定开始行以提取数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆