如何使用pandas.read_excel跳过基于正则表达式的行? [英] How to skip rows based on regex with pandas.read_excel?

查看:55
本文介绍了如何使用pandas.read_excel跳过基于正则表达式的行?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用pandas.read_excel读取Excel工作表.它的skiprows参数允许通过提供行号来跳过行.但是,如何根据模式匹配跳过行?我有不同的Excel工作表,其中我需要跳过的行数是可变的,因此提供行数对我的用例不起作用.有没有办法我可以提供一种模式-例如在包含特定字符串的行之前跳过所有行(例如测试")?如果使用熊猫read_excel无法完成此操作,是否有其他替代方法可以通过这种方式将Excel读入数据框?任何建议将不胜感激.谢谢.

I am trying to read in an excel sheet with pandas.read_excel. Its skiprows argument allows for skipping rows by supplying the row numbers. However, how can we skip rows based on a pattern match? I have different excel sheets where the number of rows I need to skip is variable so supplying the number of rows isn't going to work for my use case. Is there a way I can supply a pattern - for e.g. skip all rows before a row that contains a specific string (say 'Test')? If this can't be accomplished with pandas read_excel, is there an alternative workaround to read the excel into dataframe this way? Any suggestions would be much appreciated. Thanks.

推荐答案

我的建议是将整个Excel工作表读取到一个数据框中,然后删除不需要的行.举一个简单的例子:

My suggestion would be to read the entire excel sheet into a dataframe and afterwards drop the unwanted rows. As a simple example:

import pandas as pd

# Read out first sheet of excel workbook
df = pd.read_excel('workbook.xlsx')

# Find label of the first row where the value 'Test' is found (within column 0)
row_label = (df.iloc[:, 0] == 'Test').idxmax()

# Drop all rows above the row with 'Test'
df = df.loc[row_label:, :]

这篇关于如何使用pandas.read_excel跳过基于正则表达式的行?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆