如何将 pandas 数据帧的多行标头合并到单个单元格标头中? [英] how to merge a multirows header of a pandas dataframe into a single cell header?

查看:90
本文介绍了如何将 pandas 数据帧的多行标头合并到单个单元格标头中?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个来自excel文件的pandas DataFrame,其标头分为多行,如下例所示:

I have a pandas DataFrame from an excel file with the header split in multiple rows as the following example:

    0           1       2       3           4           5           6           7
5   NaN         NaN     NaN     NaN         NaN         NaN         NaN         Above
6   Planting    Harvest NaN     Flowering   Maturity    Maturity    Maturity    ground
7   date        date    Yield   date        date        date        date        biomass
8   YYYY.DDD    YYYY.DDD(kg/ha) YYYY.DDD    YYYY.DDD    YYYY.DDD    YYYY.DDD    (kg/ha)
9   NaN         NaN     NaN     NaN         NaN         NaN         NaN         NaN
10  1999.26     2000.21 5669.46 2000.14     2000.19     2000.19     2000.19     11626.7
11  2000.27     2001.22 10282.5 2001.15     2001.2      2001.2      2001.2      20565
12  2001.27     2002.22 8210.09 2002.15     2002.2      2002.2      2002.2      16509

我需要按列合并(包括空白作为胶水)第5到9行(包括),以便只有一个这样的标题(我已经格式化了表格,以便于阅读,所以标签数量超出实际数量)

I need to merge (that is join with a white space as glue) rows 5 to 9 (included) by column so to have just one header like this (I've formatted the table so to be easily read, so there are more tabs than actually should be)

Planting date YYYY.DDD   Harvest date YYYY.DDD    Yield (kg/ha)  Flowering date YYYY.DDD     Maturity date YYYY.DDD  Maturity date YYYY.DDD  Maturity date YYYY.DDD Above ground biomass (kg/ha)
1999.262                2000.206                5669.45623      2000.138                    2000.19                 2000.19                 2000.19                 11626.73122
2000.268                2001.216                10282.49713     2001.151                    2001.2                  2001.2                  2001.2                  20564.99427
2001.272                2002.217                8210.091653     2002.155                    2002.201                2002.201                2002.201                16509.03802

我想这应该是微不足道的,但是我找不到解决方法.

I guess it should be rather trivial, but I can't find my solution.

任何帮助将不胜感激

推荐答案

您可以先通过 str.strip删除第一个和最后一个空格. ,然后通过选择df.loc[10:]删除第一行:

You can first select by loc, then replace NaN to empty string by fillna and apply join. If necessary remove first and last whitespaces by str.strip and then remove first rows by selecting df.loc[10:]:

df.columns = df.loc[5:9].fillna('').apply(' '.join).str.strip()

#if need monotonic index (0,1,2...) add reset index
print (df.loc[10:].reset_index(drop=True))
  Planting date YYYY.DDD Harvest date YYYY.DDD(kg/ha) Yield YYYY.DDD  \
0                1999.26                      2000.21        5669.46   
1                2000.27                      2001.22        10282.5   
2                2001.27                      2002.22        8210.09   

  Flowering date YYYY.DDD Maturity date YYYY.DDD Maturity date YYYY.DDD  \
0                 2000.14                2000.19                2000.19   
1                 2001.15                 2001.2                 2001.2   
2                 2002.15                 2002.2                 2002.2   

  Maturity date (kg/ha) Above ground biomass  
0               2000.19              11626.7  
1                2001.2                20565  
2                2002.2                16509  

这篇关于如何将 pandas 数据帧的多行标头合并到单个单元格标头中?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆