Groupby并在 pandas ,Python中转置 [英] Groupby and transpose in pandas, python

查看:78
本文介绍了Groupby并在 pandas ,Python中转置的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

数据框具有

ID  col  col2   col3   col4

1   A    50      S      1
1   A    52      M      4
1   B    45      N      8
1   C    18      S      7

想要的数据框

ID  col  colA   colB   colC   colD   colE   colF

1   A    50     52      S      M       1      4
1   B    45     NULL    N     NULL     8     NULL
1   C    18     NULL    S     NULL     7     NULL

我想要每个唯一ID + col(groupby ID和col)一行. 如果每个ID + col有多个条目(最大不能为2,则不能再增加),然后将col2的第一个值放入colA并将第二个值放入colB,将col3的第一个值放入colC和第二个值在colD中,将第一个colE中的col4值和colF中的第二个值.如果每个ID + col只有一个条目,那么对于col2,将colA中的值设置为colB,而colB为null等.

I want 1 line per unique ID+col (groupby ID and col). If there are multiple entries per ID+col (max can be 2, no more) then put the first value of col2 in colA and second value in colB, put the first value of col3 in colC and second value in colD, put the first value of col4 in colE and second value in colF. If there is only one entry per ID+col then for col2 put the value in colA and colB is null etc.

我试图先创建一个计数器:

I tried to first create a counter:

df['COUNT'] = df.groupby(['ID','col']).cumcount()+1

从这里开始,我只是想添加一列以说

From here I was thinking of just adding a column to say

if count=1 then df['colA']=df.col2
if count=2 then df['colB']=df.col2

..但这仍将导致与原始df相同的行数.

.. but this will still result in the same number of rows as the original df.

推荐答案

我认为需要 unstack :

I think need set_index with unstack:

df['COUNT'] = df.groupby(['ID','col']).cumcount()+1

df = df.set_index(['ID','col', 'COUNT'])['col2'].unstack().add_prefix('col').reset_index()
print (df)
COUNT  ID col  col1  col2
0       1   A  50.0  52.0
1       1   B  45.0   NaN
2       1   C  18.0   NaN

或者:

c = df.groupby(['ID','col']).cumcount()+1

df = df.set_index(['ID','col', c])['col2'].unstack().add_prefix('col').reset_index()
print (df)
   ID col  col1  col2
0   1   A  50.0  52.0
1   1   B  45.0   NaN
2   1   C  18.0   NaN

对于多列来说,解决方案有所改变,因为在列中使用MultiIndex:

For multiple columns is solution a bit changed, because working with MultiIndex in columns:

df['COUNT'] = (df.groupby(['ID','col']).cumcount()+1).astype(str)

#remove col2
df = df.set_index(['ID','col', 'COUNT']).unstack()
#flatten Multiindex
df.columns = df.columns.map('_'.join)
df = df.reset_index()
print (df)
   ID col  col2_1  col2_2 col3_1 col3_2  col4_1  col4_2
0   1   A    50.0    52.0      S      M     1.0     4.0
1   1   B    45.0     NaN      N   None     8.0     NaN
2   1   C    18.0     NaN      S   None     7.0     NaN

这篇关于Groupby并在 pandas ,Python中转置的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆