pandas 独特价值多列 [英] pandas unique values multiple columns

查看：117 发布时间：2017/3/25 23:14:34 python pandas dataframe unique

本文介绍了 pandas 独特价值多列的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

df = pd.DataFrame({'Col1': ['Bob', 'Joe', 'Bill', 'Mary', 'Joe'],
                   'Col2': ['Joe', 'Steve', 'Bob', 'Bob', 'Steve'],
                   'Col3': np.random.random(5)})

返回Col1和Col2的唯一值的最佳方式是什么？

What is the best way to return the unique values of 'Col1' and 'Col2'?

所需的输出是

'Bob', 'Joe', 'Bill', 'Mary', 'Steve'

推荐答案

一种方法是选择列将它们传递给 np.unique ：

One way is to select the columns and pass them to np.unique:

>>> np.unique(df[['Col1', 'Col2']])
array(['Bill', 'Bob', 'Joe', 'Mary', 'Steve'], dtype=object)

请注意，一些版本的Pandas / NumPy可能需要您从列中显式传递值， code> .values 属性：

Note that some versions of Pandas/NumPy may require you to explicitly pass the values from the columns with the .values attribute:

np.unique(df[['Col1', 'Col2']].values)

更快的方法是使用 pd.unique 。该函数使用基于哈希表的算法，而不是使用NumPy的基于分类的算法。您将需要使用 ravel（）传递1D数组：

A faster way is to use pd.unique. This function uses a hashtable-based algorithm instead of NumPy's sort-based algorithm. You will need to pass a 1D array using ravel():

>>> pd.unique(df[['Col1', 'Col2']].values.ravel())
array(['Bob', 'Joe', 'Steve', 'Bill', 'Mary'], dtype=object)

对于较大的DataFrames，速度差异很大：

The difference in speed is significant for larger DataFrames:

>>> df1 = pd.concat([df]*100000) # DataFrame with 500000 rows
>>> %timeit np.unique(df1[['Col1', 'Col2']].values)
1 loops, best of 3: 619 ms per loop

>>> %timeit pd.unique(df1[['Col1', 'Col2']].values.ravel())
10 loops, best of 3: 49.9 ms per loop

这篇关于 pandas 独特价值多列的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

pandas 独特价值多列 [英] pandas unique values multiple columns

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

pandas 独特价值多列 [英] pandas unique values multiple columns

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭