使用pandas groupby计算唯一值 [英] Count unique values using pandas groupby
问题描述
我有以下形式的数据:
I have data of the following form:
df = pd.DataFrame({
'group': [1, 1, 2, 3, 3, 3, 4],
'param': ['a', 'a', 'b', np.nan, 'a', 'a', np.nan]
})
print(df)
# group param
# 0 1 a
# 1 1 a
# 2 2 b
# 3 3 NaN
# 4 3 a
# 5 3 a
# 6 4 NaN
组内的非空值始终相同。我想为每个组(存在的地方)计算一次非空值,然后查找每个值的总计数。
Non-null values within groups are always the same. I want to count the non-null value for each group (where it exists) once, and then find the total counts for each value.
我现在正在以下面的方式(笨拙而低效)来做这件事:
I'm currently doing this in the following (clunky and inefficient) way:
param = []
for _, group in df[df.param.notnull()].groupby('group'):
param.append(group.param.unique()[0])
print(pd.DataFrame({'param': param}).param.value_counts())
# a 2
# b 1
我确信有一种方法可以更干净地完成这项工作,而无需使用循环,但我可以'似乎工作了。任何帮助将不胜感激。
I'm sure there's a way to do this more cleanly and without using a loop, but I just can't seem to work it out. Any help would be much appreciated.
推荐答案
我认为您可以使用 SeriesGroupBy.nunique
: / p>
I think you can use SeriesGroupBy.nunique
:
print (df.groupby('param')['group'].nunique())
param
a 2
b 1
Name: group, dtype: int64
另一个解决方案与 独特
,然后通过 df generated / pandas.DataFrame.from_records.htmlrel =noreferrer> DataFrame.from_records
,重塑为系列
通过 stack code>
和last < a href =http://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.value_counts.html =noreferrer> value_counts
:
Another solution with unique
, then create new df
by DataFrame.from_records
, reshape to Series
by stack
and last value_counts
:
a = df[df.param.notnull()].groupby('group')['param'].unique()
print (pd.DataFrame.from_records(a.values.tolist()).stack().value_counts())
a 2
b 1
dtype: int64
这篇关于使用pandas groupby计算唯一值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!