转换Pandas DataFrame,将行值添加为列标题 [英] Transform Pandas DataFrame, add row values as column headers

查看:642
本文介绍了转换Pandas DataFrame,将行值添加为列标题的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个这样的熊猫数据框:

I have a pandas dataframe like this:

COMMIT_ID | FILE_NAME     | COMMITTER | CHANGE TYPE
-------------------------------------------------------------
  1       |  package.json | A         | MODIFY
  2       |  main.js      | B         | ADD
  2       |  class.java   | B         | DELETE

我希望文件名的行值作为列标题,而changetype作为值.

I want the row values of the file name as column headers and the changetype as the value.

COMMIT_ID | package.json | main.js     | class.java     | COMMITTER
-----------------------------------------------------------------------------
  1       |  MODIFY      |  NONE       |  NONE          | A         
  2       |  NONE        |  ADD        |  DELETE        | B      

我曾尝试过使用pandas.pivot_table,但不是很成功.有机会轻松做到这一点吗?

I have tried with pandas.pivot_table but wasn't very successful. Any chance to do this easily?

推荐答案

我认为您需要 unstack :

I think you need set_index + unstack:

df = df.set_index(['COMMIT_ID','COMMITTER','FILE_NAME'])['CHANGE TYPE']
       .unstack()
      .reset_index()
print (df)
FILE_NAME  COMMIT_ID COMMITTER class.java main.js package.json
0                  1         A       None    None       MODIFY
1                  2         B     DELETE     ADD         None

使用 pivot_table 的解决方案-需要聚集函数,例如sum(无分隔符的连接字符串)或'_'.join(有分隔符的连接字符串),如果重复:

Solutions with pivot_table - need aggregate function like sum (concatenate strings without separator) or '_'.join (concatenate strings with separator) if duplicates:

print (df)
   COMMIT_ID     FILE_NAME COMMITTER CHANGE TYPE
0          1  package.json         A      MODIFY
1          2       main.js         B         ADD
2          2    class.java         B      DELETE
3          2    class.java         B         ADD


df = df.pivot_table(index=['COMMIT_ID','COMMITTER'], 
                    columns='FILE_NAME', 
                    values='CHANGE TYPE', 
                    aggfunc='sum').reset_index()
print (df)
FILE_NAME  COMMIT_ID COMMITTER class.java main.js package.json
0                  1         A       None    None       MODIFY
1                  2         B  DELETEADD     ADD         None

或者:

df = df.pivot_table(index=['COMMIT_ID','COMMITTER'], 
                    columns='FILE_NAME', 
                    values='CHANGE TYPE', 
                    aggfunc='_'.join).reset_index()
print (df)
FILE_NAME  COMMIT_ID COMMITTER  class.java main.js package.json
0                  1         A        None    None       MODIFY
1                  2         B  DELETE_ADD     ADD         None

使用first进行汇总也可以,但是您可能会丢失重复值:

Aggregate with first works also, but you can lost duplicates values:

df = df.pivot_table(index=['COMMIT_ID','COMMITTER'], 
                    columns='FILE_NAME', 
                    values='CHANGE TYPE', 
                    aggfunc='first').reset_index()
print (df)
FILE_NAME  COMMIT_ID COMMITTER class.java main.js package.json
0                  1         A       None    None       MODIFY
1                  2         B     DELETE     ADD         None

最后一个重命名列名称添加 rename_axis :

Last for rename columns names add rename_axis:

df = df.rename_axis(None, axis=1)
print (df)
   COMMIT_ID COMMITTER class.java main.js package.json
0          1         A       None    None       MODIFY
1          2         B  DELETEADD     ADD         None

这篇关于转换Pandas DataFrame,将行值添加为列标题的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆