读取按nan行拆分的数据框,并在Python中将其重塑为多个数据框 [英] Read dataframe split by nan rows and reshape them into multiple dataframes in Python

查看:166
本文介绍了读取按nan行拆分的数据框,并在Python中将其重塑为多个数据框的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个示例Excel文件 data1.xlsx 来自


现在我想使用 openpyxl pandas ,然后将它们转换为新的 df1 df2 最终将它们另存为价格数量表:


价格表:



数量表



代码我用过:

  df = pd.read_excel('./ data1.xlsx',sheet_name ='Sheet1')
df_list = np.split(df,df [df.isnull()。all(1)]。index)

for df_list中的df:
print(df,'\n')

出局:

  bj未命名:1未命名:2未命名:3未命名:4 
0年2018.0 2019.0 2020.0总和
1价格12.0 4.0 5.0 21
2数量5.0 5.0 3.0 13

bj未命名: 1未命名:2未命名:3未命名:4
3 NaN NaN NaN NaN NaN
4 sh NaN NaN NaN NaN NaN
5年2018.0 2019.0 2020.0总和
6价格5.0 6.0 7.0 18
7数量7.0 5.0 4.0 16

bj未命名:1未命名:2未命名:3未命名:4
8 NaN NaN NaN NaN NaN

bj未命名:1未命名:2未命名: 3未命名:4
9 NaN NaN NaN NaN NaN
10 gz NaN NaN NaN NaN
11年2018.0 2019.0 2020.0总和
12价格2.0 3.0 1.0 6
13数量6.0 9.0 3.0 18

bj未命名:1未命名:2未命名:3未命名:4
14 NaN NaN NaN NaN NaN

bbj未命名:1未命名:2未命名:3未命名:4
15 NaN NaN NaN NaN NaN NaN
16 sz NaN NaN NaN NaN
17年2018.0 2019.0 2020.0总和
18价格8.0 2.0 3.0 13
19数量5.0 4.0 3.0 12

如何在Python中做到这一点?非常感谢。

解决方案

使用:

  #add header =默认列名称
df = pd.read_excel('./ data1.xlsx',sheet_name ='Sheet1',header = None)

#转换列第二行
df.columns = df.iloc [1] .rename(None)

#通过第二列
df向前填充非缺失值来创建新列`city` .insert(0,'city',df.iloc [:, 0] .mask(df.iloc [:, 1] .notna())。ffill())
#将浮点数转换为整数
df.columns = [如果isinstance(x,float)则为int(x),否则df.columns中的x为x]
#将列年转换为索引
df = df.set_index('year')




 打印(df)
城市2018 2019 2020总和

bj bj NaN NaN NaN NaN NaN
年bj 2018.0 2019.0 2020.0总和
价格bj 12.0 4.0 5.0 21
数量bj 5.0 5.0 3.0 13
NaN bj NaN NaN NaN NaN
sh NaN NaN NaN NaN
年sh 2018.0 2019.0 2020.0 sum
价格sh 5.0 6.0 7.0 18
数量sh 7.0 5.0 4.0 16 16
NaN sh NaN NaN NaN NaN
NaN sh NaN NaN NaN NaN
gz gz NaN NaN NaN NaN
年gz 2018.0 2019.0 2020.0总和
价格gz 2.0 3.0 1.0 6
数量gz 6.0 9.0 3.0 18
NaN gz NaN NaN NaN NaN
NaN gz NaN NaN NaN NaN
sz sNaN NaN NaN NaN
年sz 2018.0 2019.0 2020.0总和
价格sz 8.0 2.0 3.0 13
数量sz 5.0 4.0 3.0 12




  df1 = df.loc ['price']。reset_index(drop = True)
打印(df1)
city 2018 2019 2020 sum
0 bj 12.0 4.0 5.0 21
1 sh 5.0 6.0 7.0 18
2 gz 2.0 3.0 1.0 6
3 sz 8.0 2.0 3.0 13

df2 = df.loc ['quantity']。reset_index(drop = True)
打印(df2)
城市2018 2019 2020总和
0 bj 5.0 5.0 3.0 13
1 sh 7.0 5.0 4.0 16
2 gz 6.0 9.0 3.0 18
3 sz 5.0 4.0 3.0 12

最后将 DataFrame s写入现有文件是可以通过 mode ='a'参数链接

 与pd.ExcelWriter('data1.xlsx',模式='a')作为作者:
df1.to_excel(writer,sheet_name ='price')
df2.to_excel(writer,sheet_name ='quantity')


I have a example excel file data1.xlsx from here, which has a Sheet1 as follows:

Now I want to read it with openpyxl or pandas, then convert them into new df1 and df2, I will finally save them as price and quantity sheet:

price sheet:

and quantity sheet

Code I have used:

df = pd.read_excel('./data1.xlsx', sheet_name = 'Sheet1')
df_list = np.split(df, df[df.isnull().all(1)].index) 

for df in df_list:
    print(df, '\n')

Out:

         bj  Unnamed: 1  Unnamed: 2  Unnamed: 3 Unnamed: 4
0      year      2018.0      2019.0      2020.0        sum
1     price        12.0         4.0         5.0         21
2  quantity         5.0         5.0         3.0         13 

         bj  Unnamed: 1  Unnamed: 2  Unnamed: 3 Unnamed: 4
3       NaN         NaN         NaN         NaN        NaN
4        sh         NaN         NaN         NaN        NaN
5      year      2018.0      2019.0      2020.0        sum
6     price         5.0         6.0         7.0         18
7  quantity         7.0         5.0         4.0         16 

    bj  Unnamed: 1  Unnamed: 2  Unnamed: 3 Unnamed: 4
8  NaN         NaN         NaN         NaN        NaN 

          bj  Unnamed: 1  Unnamed: 2  Unnamed: 3 Unnamed: 4
9        NaN         NaN         NaN         NaN        NaN
10        gz         NaN         NaN         NaN        NaN
11      year      2018.0      2019.0      2020.0        sum
12     price         2.0         3.0         1.0          6
13  quantity         6.0         9.0         3.0         18 

     bj  Unnamed: 1  Unnamed: 2  Unnamed: 3 Unnamed: 4
14  NaN         NaN         NaN         NaN        NaN 

          bj  Unnamed: 1  Unnamed: 2  Unnamed: 3 Unnamed: 4
15       NaN         NaN         NaN         NaN        NaN
16        sz         NaN         NaN         NaN        NaN
17      year      2018.0      2019.0      2020.0        sum
18     price         8.0         2.0         3.0         13
19  quantity         5.0         4.0         3.0         12 

How could I do that in Python? Thanks a lot.

解决方案

Use:

#add header=None for default columns names
df = pd.read_excel('./data1.xlsx', sheet_name = 'Sheet1', header=None)

#convert columns by second row
df.columns = df.iloc[1].rename(None)

#create new column `city` by forward filling non missing values by second column
df.insert(0, 'city', df.iloc[:, 0].mask(df.iloc[:, 1].notna()).ffill())
#convert floats to integers 
df.columns = [int(x) if isinstance(x, float) else x for x in df.columns]
#convert column year to index
df = df.set_index('year')


print (df)
         city    2018    2019    2020  sum
year                                      
bj         bj     NaN     NaN     NaN  NaN
year       bj  2018.0  2019.0  2020.0  sum
price      bj    12.0     4.0     5.0   21
quantity   bj     5.0     5.0     3.0   13
NaN        bj     NaN     NaN     NaN  NaN
sh         sh     NaN     NaN     NaN  NaN
year       sh  2018.0  2019.0  2020.0  sum
price      sh     5.0     6.0     7.0   18
quantity   sh     7.0     5.0     4.0   16
NaN        sh     NaN     NaN     NaN  NaN
NaN        sh     NaN     NaN     NaN  NaN
gz         gz     NaN     NaN     NaN  NaN
year       gz  2018.0  2019.0  2020.0  sum
price      gz     2.0     3.0     1.0    6
quantity   gz     6.0     9.0     3.0   18
NaN        gz     NaN     NaN     NaN  NaN
NaN        gz     NaN     NaN     NaN  NaN
sz         sz     NaN     NaN     NaN  NaN
year       sz  2018.0  2019.0  2020.0  sum
price      sz     8.0     2.0     3.0   13
quantity   sz     5.0     4.0     3.0   12


df1 = df.loc['price'].reset_index(drop=True)
print (df1)
  city  2018  2019  2020 sum
0   bj  12.0   4.0   5.0  21
1   sh   5.0   6.0   7.0  18
2   gz   2.0   3.0   1.0   6
3   sz   8.0   2.0   3.0  13

df2 = df.loc['quantity'].reset_index(drop=True)
print (df2)
  city  2018  2019  2020 sum
0   bj   5.0   5.0   3.0  13
1   sh   7.0   5.0   4.0  16
2   gz   6.0   9.0   3.0  18
3   sz   5.0   4.0   3.0  12

Last write DataFrames to existing file is possible by mode='a' parameter, link:

with pd.ExcelWriter('data1.xlsx', mode='a') as writer:  
    df1.to_excel(writer, sheet_name='price')
    df2.to_excel(writer, sheet_name='quantity')

这篇关于读取按nan行拆分的数据框,并在Python中将其重塑为多个数据框的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆