在 pandas 中连续获取唯一值 [英] Get count unique values in a row in pandas
本文介绍了在 pandas 中连续获取唯一值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
假设我具有以下数据框:
Suppose I have the following data frame:
0 1 2
new NaN NaN
new one one
a b c
NaN NaN NaN
如何获取一行中唯一(非NaN)值的数量,例如:
How would I get the number of unique (non-NaN) values in a row, such as:
0 1 2 _num_unique_values
new NaN NaN 1
new one one 2
a b c 3
NaN NaN NaN 0
我想可能是这样的:
df['_num_unique_values'] = len(set(df.loc.tolist())) ??
推荐答案
使用列表理解....和set
:
Use a list comprehension.... with set
:
df['num_uniq'] = [len(set(v[pd.notna(v)].tolist())) for v in df.values]
df
0 1 2 num_uniq
0 new NaN NaN 1
1 new one one 2
2 a b c 3
3 NaN NaN NaN 0
您可以使用stack
,groupby
和nunique
进行此操作.
You could do this with stack
, groupby
and nunique
.
# df.join(df.stack().groupby(level=0).nunique().to_frame('num_uniq'))
df['num_uniq'] = df.stack().groupby(level=0).nunique()
df
0 1 2 num_uniq
0 new NaN NaN 1.0
1 new one one 2.0
2 a b c 3.0
3 NaN NaN NaN NaN
另一个选择是apply
和nunique
:
df['num_uniq'] = df.apply(pd.Series.nunique, axis=1)
df
0 1 2 num_uniq
0 new NaN NaN 1
1 new one one 2
2 a b c 3
3 NaN NaN NaN 0
性能
df_ = df
df = pd.concat([df_] * 1000, ignore_index=True)
%timeit df['num_uniq'] = [len(set(v[pd.notna(v)])) for v in df.values]
%timeit df['num_uniq'] = df.stack().groupby(level=0).nunique()
%timeit df['num_uniq'] = df.apply(pd.Series.nunique, axis=1)
%timeit df['num_uniq'] = df.nunique(1)
196 ms ± 10.1 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
6.34 ms ± 343 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
679 ms ± 24 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
3.21 ms ± 343 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
这篇关于在 pandas 中连续获取唯一值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文