pandas :如何获取包含值列表的列的唯一值? [英] Pandas: how to get the unique values of a column that contains a list of values?
本文介绍了 pandas :如何获取包含值列表的列的唯一值?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
考虑以下数据框
df = pd.DataFrame({'name' : [['one two','three four'], ['one'],[], [],['one two'],['three']],
'col' : ['A','B','A','B','A','B']})
df.sort_values(by='col',inplace=True)
df
Out[62]:
col name
0 A [one two, three four]
2 A []
4 A [one two]
1 B [one]
3 B []
5 B [three]
我想获得一列,以跟踪col
的每种组合在name
中包含的所有唯一字符串.
I would like to get a column that keeps track of all the unique strings included in name
for each combination of col
.
也就是说,预期的输出是
That is, the expected output is
df
Out[62]:
col name unique_list
0 A [one two, three four] [one two, three four]
2 A [] [one two, three four]
4 A [one two] [one two, three four]
1 B [one] [one, three]
3 B [] [one, three]
5 B [three] [one, three]
实际上,对于A组,您可以看到[one two, three four]
,[]
和[one two]
中包含的唯一字符串集是[one two]
Indeed, say for group A, you can see that the unique set of strings included in [one two, three four]
, []
and [one two]
is [one two]
I can obtain the corresponding number of unique values using Pandas : how to get the unique number of values in cells when cells contain lists? :
df['count_unique']=df.groupby('col')['name'].transform(lambda x: list(pd.Series(x.apply(pd.Series).stack().reset_index(drop=True, level=1).nunique())))
df
Out[65]:
col name count_unique
0 A [one two, three four] 2
2 A [] 2
4 A [one two] 2
1 B [one] 2
3 B [] 2
5 B [three] 2
,但是用上面的unique
替换nunique
失败.
but replacing nunique
with unique
above fails.
有什么想法吗? 谢谢!
Any ideas? Thanks!
推荐答案
这是解决方案
df['unique_list'] = df.col.map(df.groupby('col')['name'].sum().apply(np.unique))
df
这篇关于 pandas :如何获取包含值列表的列的唯一值?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文