pandas :如何获取包含值列表的列的唯一值? [英] Pandas: how to get the unique values of a column that contains a list of values?

查看：84 发布时间：2020/5/24 2:32:27 python pandas

本文介绍了 pandas :如何获取包含值列表的列的唯一值?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

考虑以下数据框

df = pd.DataFrame({'name' : [['one two','three four'], ['one'],[], [],['one two'],['three']],
                   'col' : ['A','B','A','B','A','B']})       
df.sort_values(by='col',inplace=True)

df
Out[62]: 
  col                   name
0   A  [one two, three four]
2   A                     []
4   A              [one two]
1   B                  [one]
3   B                     []
5   B                [three]

我想获得一列，以跟踪col的每种组合在name中包含的所有唯一字符串.

I would like to get a column that keeps track of all the unique strings included in name for each combination of col.

也就是说，预期的输出是

That is, the expected output is

df
Out[62]: 
  col                   name    unique_list
0   A  [one two, three four]    [one two, three four]
2   A                     []    [one two, three four]
4   A              [one two]    [one two, three four]
1   B                  [one]    [one, three]
3   B                     []    [one, three]
5   B                [three]    [one, three]

实际上，对于A组，您可以看到[one two, three four]，[]和[one two]中包含的唯一字符串集是[one two]

Indeed, say for group A, you can see that the unique set of strings included in [one two, three four], [] and [one two] is [one two]

我可以使用

I can obtain the corresponding number of unique values using Pandas : how to get the unique number of values in cells when cells contain lists? :

df['count_unique']=df.groupby('col')['name'].transform(lambda x: list(pd.Series(x.apply(pd.Series).stack().reset_index(drop=True, level=1).nunique())))


df
Out[65]: 
  col                   name count_unique
0   A  [one two, three four]            2
2   A                     []            2
4   A              [one two]            2
1   B                  [one]            2
3   B                     []            2
5   B                [three]            2

，但是用上面的unique替换nunique失败.

but replacing nunique with unique above fails.

有什么想法吗? 谢谢！

Any ideas? Thanks!

pandas :如何获取包含值列表的列的唯一值? [英] Pandas: how to get the unique values of a column that contains a list of values?

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

pandas :如何获取包含值列表的列的唯一值? [英] Pandas: how to get the unique values of a column that contains a list of values?

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭