动态跳过python pandas中excel的顶部空白行 [英] dynamically skip top blank rows of excel in python pandas

查看:1514
本文介绍了动态跳过python pandas中excel的顶部空白行的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用python中的pandas读取多张Excel文件. 我有三种情况

I am reading multiple sheets of an excel file using pandas in python. I have three cases

  1. 某些工作表的数据来自第1行

osht=pd.DataFrame(filename+sheetname)

delimited table Example: 
Country;Company;Product
US;ABC;XYZ
US;ABD;XYY

  1. ,有些在表格之前有n个空白行,而某些工作表具有以下内容的摘要: 我知道使用skip_blank可以摆脱顶部的空白行,但是顶部空白的行数实际上不是固定的,可能是3或4或8
  1. and some has n number of blank rows before table and some sheet has summary of I know using skip_blank I can get rid of top blank rows but number of top blank rows are not fixed in nature may be 3 or 4 or 8

delimited table Example: 
;;
;;
;;
Country;Company;Product
US;ABC;XYZ
US;ABD;XYY

    我在第一列中的
  1. 表正在尝试阅读所有这些表,但不确定如何 -有没有办法找出第三行的摘要结尾,第四行是我的表格标题,第一列的标题是国家/地区"
  1. table in 1st column I am trying to read all these sheet but not sure how -Is there any way to figure out that from 3rd rows summary ends and 4th row is my table heading and first column head is 'Country'

delimited table Example: 

Product summary table for East region;;
Date: 1st Sep, 2016;;
;;
Country;Company;Product
US;ABC;XYZ
US;ABD;XYY

推荐答案

我会提出以下算法:

  1. 阅读整个表格
  2. 将不包含任何缺失值的第一行视为标题
  3. 删除标题上方的所有行

此代码对我来说还行:

import pandas as pd
for sheet in range(3):
    raw_data = pd.read_excel('blank_rows.xlsx', sheetname=sheet, header=None)
    print(raw_data)
    # looking for the header row
    for i, row in raw_data.iterrows():
        if row.notnull().all():
            data = raw_data.iloc[(i+1):].reset_index(drop=True)
            data.columns = list(raw_data.iloc[i])
            break
    # transforming columns to numeric where possible
    for c in data.columns:
        data[c] = pd.to_numeric(data[c], errors='ignore')
    print(data)

根据您的示例,它使用此玩具数据示例.从原始数据帧开始

It uses this toy data sample, based on your examples. From the raw dataframes

         0        1        2
0  Country  Company  Product
1       US      ABC      XYZ
2       US      ABD      XYY

         0        1        2
0      NaN      NaN      NaN
1      NaN      NaN      NaN
2      NaN      NaN      NaN
3  Country  Company  Product
4       US      ABC      XYZ
5       US      ABD      XYY

                                       0        1        2
0  Product summary table for East region      NaN      NaN
1                    Date: 1st Sep, 2016      NaN      NaN
2                                    NaN      NaN      NaN
3                                Country  Company  Product
4                                     US      ABC      XYZ
5                                     US      ABD      XYY

脚本生成相同的表

  Country Company Product
0      US     ABC     XYZ
1      US     ABD     XYY

这篇关于动态跳过python pandas中excel的顶部空白行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆