如何创建新列存储重复ID列的数据? [英] How to Create New Columns to Store the Data of the Duplicate ID Column?

查看:157
本文介绍了如何创建新列存储重复ID列的数据?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有这个数据框:

  ID密钥
0 1 A
1 1 B
2 2 C
3 3 D
4 3 E
5 3 E

我想创建额外的列 - 必要时将数据存储在列中当有重复的 ID



这是输出的代码片段:

  ID键key2 
0 1 AB#注意:ID#1在数据框中出现两次,所以键值B
#重复的ID将被存储在新列key2

完整的输出应该如下所示:

  ID键key2 key3 
0 1 AB NaN
1 2 C NaN NaN
2 3 DEE#ID#3重复三次。第二个重复E的
#的键将存储在key2列
#下,第三个重复E将存储在新列key3

任何建议或想法应该如何解决这个问题?



解决方案

查看 groupby 申请。他们各自的文档是 here here 。你可以 unpack docs )创建的MultiIndex的额外级别。

  df.groupby('ID ')['key']。apply(
lambda s:pd.Series(s.values,index = ['key_%s'%i for i in range(s.shape [0])])
).unstack(-1)

输出

  key_0 key_1 key_2 
ID
1 AB无
2 C无无
3 DEE

如果您想要 ID 作为列,可以调用 reset_index 在此DataFrame。


I have this dataframe:

   ID  key
0   1    A
1   1    B
2   2    C
3   3    D
4   3    E
5   3    E

I want to create additional key columns -as necessary- to store the data in the key column when there are duplicate IDs

This is a snippet of the output:

   ID  key  key2  
0   1    A     B # Note: ID#1 appeared twice in the dataframe, so the key value "B"
                 # associated with the duplicate ID will be stored in the new column "key2"

The complete output should like the following:

    ID  key  key2   key3
0   1    A      B    NaN
1   2    C    NaN    NaN
2   3    D      E      E # The ID#3 has repeated three times.  The key of                    
                         # of the second repeat "E" will be stored under the "key2" column
                         # and the third repeat "E" will be stored in the new column "key3"  

Any suggestion or idea how should I approach this problem?

Thanks,

解决方案

Check out groupby and apply. Their respective docs are here and here. You can unstack (docs) the extra level of the MultiIndex that is created.

df.groupby('ID')['key'].apply(
    lambda s: pd.Series(s.values, index=['key_%s' % i for i in range(s.shape[0])])
).unstack(-1)

outputs

   key_0 key_1 key_2
ID                  
1      A     B  None
2      C  None  None
3      D     E     E

If you want ID as a column, you can call reset_index on this DataFrame.

这篇关于如何创建新列存储重复ID列的数据?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆