动态跳过python pandas中excel的顶部空白行 [英] dynamically skip top blank rows of excel in python pandas
本文介绍了动态跳过python pandas中excel的顶部空白行的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我正在使用python中的pandas读取多张Excel文件. 我有三种情况
I am reading multiple sheets of an excel file using pandas in python. I have three cases
- 某些工作表的数据来自第1行
osht=pd.DataFrame(filename+sheetname)
delimited table Example:
Country;Company;Product
US;ABC;XYZ
US;ABD;XYY
- ,有些在表格之前有n个空白行,而某些工作表具有以下内容的摘要: 我知道使用skip_blank可以摆脱顶部的空白行,但是顶部空白的行数实际上不是固定的,可能是3或4或8
- and some has n number of blank rows before table and some sheet has summary of I know using skip_blank I can get rid of top blank rows but number of top blank rows are not fixed in nature may be 3 or 4 or 8
delimited table Example:
;;
;;
;;
Country;Company;Product
US;ABC;XYZ
US;ABD;XYY
-
我在第一列中的
- 表正在尝试阅读所有这些表,但不确定如何 -有没有办法找出第三行的摘要结尾,第四行是我的表格标题,第一列的标题是国家/地区"
- table in 1st column I am trying to read all these sheet but not sure how -Is there any way to figure out that from 3rd rows summary ends and 4th row is my table heading and first column head is 'Country'
delimited table Example:
Product summary table for East region;;
Date: 1st Sep, 2016;;
;;
Country;Company;Product
US;ABC;XYZ
US;ABD;XYY
推荐答案
我会提出以下算法:
- 阅读整个表格
- 将不包含任何缺失值的第一行视为标题
- 删除标题上方的所有行
此代码对我来说还行:
import pandas as pd
for sheet in range(3):
raw_data = pd.read_excel('blank_rows.xlsx', sheetname=sheet, header=None)
print(raw_data)
# looking for the header row
for i, row in raw_data.iterrows():
if row.notnull().all():
data = raw_data.iloc[(i+1):].reset_index(drop=True)
data.columns = list(raw_data.iloc[i])
break
# transforming columns to numeric where possible
for c in data.columns:
data[c] = pd.to_numeric(data[c], errors='ignore')
print(data)
根据您的示例,它使用此玩具数据示例.从原始数据帧开始
It uses this toy data sample, based on your examples. From the raw dataframes
0 1 2
0 Country Company Product
1 US ABC XYZ
2 US ABD XYY
0 1 2
0 NaN NaN NaN
1 NaN NaN NaN
2 NaN NaN NaN
3 Country Company Product
4 US ABC XYZ
5 US ABD XYY
0 1 2
0 Product summary table for East region NaN NaN
1 Date: 1st Sep, 2016 NaN NaN
2 NaN NaN NaN
3 Country Company Product
4 US ABC XYZ
5 US ABD XYY
脚本生成相同的表
Country Company Product
0 US ABC XYZ
1 US ABD XYY
这篇关于动态跳过python pandas中excel的顶部空白行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文