如何将合并的具有NaN的Excel单元格读取到Pandas DataFrame中 [英] How to read merged Excel cells with NaN into Pandas DataFrame

查看:1075
本文介绍了如何将合并的具有NaN的Excel单元格读取到Pandas DataFrame中的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想将Excel工作表读入Pandas DataFrame.但是,存在合并的Excel单元格和空行(填充了完整/部分NaN),如下所示.为了澄清,John H.已下令购买从保镖"到红色药丸蓝调"的所有专辑.

I would like to read an Excel sheet into Pandas DataFrame. However, there are merged Excel cells as well as Null rows (full/partial NaN filled), as shown below. To clarify, John H. has made an order to purchase all the albums from "The Bodyguard" to "Red Pill Blues".

当我将此Excel工作表读取到Pandas DataFrame中时,Excel数据无法正确传输.熊猫将合并的单元格视为一个单元格. DataFrame如下所示:(注意:()中的值是我想要的值)

When I read this Excel sheet into a Pandas DataFrame, the Excel data does not get transferred correctly. Pandas considers a merged cell as one cell. The DataFrame looks like the following: (Note: Values in () are the desired values that I would like to have there)

请注意,最后一行不包含合并的单元格;它仅带有Artist列的值.

Please note that the last row does not contain merged cells; it only carries a value for Artist column.


编辑: 我确实尝试了以下方法来预先填写NaN值:( Pandas:使用合并的单元格)


I did try the following to forward-fill in the NaN values:(Pandas: Reading Excel with merged cells)

df.index = pd.Series(df.index).fillna(method='ffill')  

但是,NaN值仍然保留. 我可以使用什么策略或方法正确地填充DataFrame?是否有一种Pandas方法来取消单元格的复制并复制相应的内容?

However, the NaN values remain. What strategy or method could I use to populate the DataFrame correctly? Is there a Pandas method of unmerging the cells and duplicating the corresponding contents?

推荐答案

您尝试引用的链接只需要向前填充 index 列.对于您的用例,您需要fillna用于所有数据框列.因此,只需向前填充整个数据框:

The referenced link you attempted needed to forward fill only the index column. For your use case, you need to fillna for all dataframe columns. So, simply forward fill entire dataframe:

df = pd.read_excel("Input.xlsx")
print(df)

#    Order_ID Customer_name            Album_Name           Artist  Quantity
# 0       NaN           NaN            RadioShake              NaN       NaN
# 1       1.0       John H.         The Bodyguard  Whitney Houston       2.0
# 2       NaN           NaN              Lemonade          Beyonce       1.0
# 3       NaN           NaN  The Thrill Of It All        Sam Smith       2.0
# 4       NaN           NaN              Thriller  Michael Jackson      11.0
# 5       NaN           NaN                Divide       Ed Sheeran       4.0
# 6       NaN           NaN            Reputation     Taylor Swift       3.0
# 7       NaN           NaN        Red Pill Blues         Maroon 5       5.0

df = df.fillna(method='ffill')
print(df)

#    Order_ID Customer_name            Album_Name           Artist  Quantity
# 0       NaN           NaN            RadioShake              NaN       NaN
# 1       1.0       John H.         The Bodyguard  Whitney Houston       2.0
# 2       1.0       John H.              Lemonade          Beyonce       1.0
# 3       1.0       John H.  The Thrill Of It All        Sam Smith       2.0
# 4       1.0       John H.              Thriller  Michael Jackson      11.0
# 5       1.0       John H.                Divide       Ed Sheeran       4.0
# 6       1.0       John H.            Reputation     Taylor Swift       3.0
# 7       1.0       John H.        Red Pill Blues         Maroon 5       5.0

这篇关于如何将合并的具有NaN的Excel单元格读取到Pandas DataFrame中的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆