在CSV文件/Pandas Dataframe中查找标题行的行号 [英] Finding the row number for the header row in a CSV file / Pandas Dataframe

查看:594
本文介绍了在CSV文件/Pandas Dataframe中查找标题行的行号的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试获取在CSV文件中包含标题的行的索引或行号. 问题是,标题行可以根据我们系统中报告的输出上下移动(我无法控制更改)

I am trying to get an index or row number for the row that holds the headers in my CSV file. The issue is, the header row can move up and down depending on the output of the report from our system (I have no control to change this)

代码:

ht = pd.read_csv(file.csv)
test = ht.get_loc('Code') #Code being header im using to locate the header row
csv1 = read_csv(file.csv, header=test)
df1 = df1.append(csv1) #Appending as have many files

如果我要进行打印测试,我希望数字在4或5附近,这就是我要读入的第二个"read_csv"

If I was to print test, I would expect a number around 4 or 5, and that's what I am feeding into the second read "read_csv"

我得到的错误是它期望有1个标题列,但我有26个列.我只是想使用第一个标题字符串来获取行号

The error I'm getting is that it's expecting 1 header column, but I have 26 columns. I am just trying to use the first header string to get the row number

谢谢 :-)

CSV格式

This file contains the data around the volume of items blablalbla
the deadlines for delivery of items a - z is 5 days
the deadlines for delivery of items aa through zz are 3 days
the deadlines for delivery of items aaa through zzz are 1 days
code,type,arrived_date,est_del_date
a/wrwgwr12/001,kids,12-dec-18,17-dec-18
aa/gjghgj35/030,pet,15-dec-18,18-dec-18

您将看到最后期限"行是相同的,根据代码ID可以是3或5,因此标题行可以向上或向下更改.

as you will see the "The deadlines" rows are the same, this can be 3 or 5 based on the code ids, thus the header row can change up or down.

我也没有写出全部26个列标题,不确定是否重要.

I also did not write out all 26 column headers, not sure that matters.

想要的DF格式

index |    code         |   type   | arrived_date | est_del_date
1     | a/wrwgwr12/001  |   kids   |   12-dec-18  | 17-dec-18
2     | aa/gjghgj35/030 |  Pet     |  15-dec-18   | 18-dec-18

希望这是有道理的.

谢谢

推荐答案

您可以使用csv模块找到包含定界符的第一行,然后将该行的索引作为skiprows参数输入到 pd.read_csv :

You can use the csv module to find the first row which contains a delimiter, then feed the index of this row as the skiprows parameter to pd.read_csv:

from io import StringIO
import csv
import pandas as pd

x = """This file contains the data around the volume of items blablalbla
the deadlines for delivery of items a - z is 5 days
the deadlines for delivery of items aa through zz are 3 days
the deadlines for delivery of items aaa through zzz are 1 days
code,type,arrived_date,est_del_date
a/wrwgwr12/001,kids,12-dec-18,17-dec-18
aa/gjghgj35/030,pet,15-dec-18,18-dec-18"""

# replace StringIO(x) with open('file.csv', 'r')
with StringIO(x) as fin:
    reader = csv.reader(fin)
    idx = next(idx for idx, row in enumerate(reader) if len(row) > 1)  # 4

# replace StringIO(x) with 'file.csv'
df = pd.read_csv(StringIO(x), skiprows=idx)

print(df)

              code  type arrived_date est_del_date
0   a/wrwgwr12/001  kids    12-dec-18    17-dec-18
1  aa/gjghgj35/030   pet    15-dec-18    18-dec-18

这篇关于在CSV文件/Pandas Dataframe中查找标题行的行号的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆