如何创建新列存储重复ID列的数据? [英] How to Create New Columns to Store the Data of the Duplicate ID Column?
问题描述
我有这个数据框:
ID密钥
0 1 A
1 1 B
2 2 C
3 3 D
4 3 E
5 3 E
我想创建额外的键
列 - 必要时将数据存储在键
列中当有重复的 ID
这是输出的代码片段:
ID键key2
0 1 AB#注意:ID#1在数据框中出现两次,所以键值B
#重复的ID将被存储在新列key2
完整的输出应该如下所示:
ID键key2 key3
0 1 AB NaN
1 2 C NaN NaN
2 3 DEE#ID#3重复三次。第二个重复E的
#的键将存储在key2列
#下,第三个重复E将存储在新列key3
中
任何建议或想法应该如何解决这个问题?
查看 groupby
和申请
。他们各自的文档是 here 和 here 。你可以 unpack
( docs )创建的MultiIndex的额外级别。
df.groupby('ID ')['key']。apply(
lambda s:pd.Series(s.values,index = ['key_%s'%i for i in range(s.shape [0])])
).unstack(-1)
输出
key_0 key_1 key_2
ID
1 AB无
2 C无无
3 DEE
如果您想要 ID
作为列,可以调用 reset_index
在此DataFrame。
I have this dataframe:
ID key
0 1 A
1 1 B
2 2 C
3 3 D
4 3 E
5 3 E
I want to create additional key
columns -as necessary- to store the data in the key
column when there are duplicate IDs
This is a snippet of the output:
ID key key2
0 1 A B # Note: ID#1 appeared twice in the dataframe, so the key value "B"
# associated with the duplicate ID will be stored in the new column "key2"
The complete output should like the following:
ID key key2 key3
0 1 A B NaN
1 2 C NaN NaN
2 3 D E E # The ID#3 has repeated three times. The key of
# of the second repeat "E" will be stored under the "key2" column
# and the third repeat "E" will be stored in the new column "key3"
Any suggestion or idea how should I approach this problem?
Thanks,
Check out groupby
and apply
. Their respective docs are here and here. You can unstack
(docs) the extra level of the MultiIndex that is created.
df.groupby('ID')['key'].apply(
lambda s: pd.Series(s.values, index=['key_%s' % i for i in range(s.shape[0])])
).unstack(-1)
outputs
key_0 key_1 key_2
ID
1 A B None
2 C None None
3 D E E
If you want ID
as a column, you can call reset_index
on this DataFrame.
这篇关于如何创建新列存储重复ID列的数据?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!