将分组的聚合nunique列添加到pandas数据框 [英] Adding a grouped, aggregate nunique column to pandas dataframe

查看:170
本文介绍了将分组的聚合nunique列添加到pandas数据框的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想向我的pandas数据框添加一个聚合的,分组的,唯一的列,但不聚合整个数据框.我正在尝试一行执行此操作,避免创建一个新的聚合对象并将其合并,等等.

I want to add an aggregate, grouped, nunique column to my pandas dataframe but not aggregate the entire dataframe. I'm trying to do this in one line and avoid creating a new aggregated object and merging that, etc.

我的df有曲目,类型和ID.我想要每个轨道/类型组合的唯一ID的数量作为表中的新列(但不要在生成的df中折叠轨道/类型组合).行数相同,另外1列.

my df has track, type, and id. I want the number of unique ids for each track/type combination as a new column in the table (but not collapse track/type combos in the resulting df). Same number of rows, 1 more column.

类似的东西不起作用:

df['n_unique_id'] = df.groupby(['track', 'type'])['id'].nunique()

也不是

df['n_unique_id'] = df.groupby(['track', 'type'])['id'].transform(nunique)

最后一个使用某些聚合功能,但不使用其他聚合功能.以下作品(但对我的数据集毫无意义):

this last one works with some aggregating functions but not others. the following works (but is meaningless on my dataset):

df['n_unique_id'] = df.groupby(['track', 'type'])['id'].transform(sum)

在R中,这很容易在data.table中完成,

in R this is easily done in data.table with

df[, n_unique_id := uniqueN(id), by = c('track', 'type')]

谢谢!

推荐答案

df.groupby(['track', 'type'])['id'].transform(nunique)

表示在名称空间中存在执行某些功能的名称nunique. transform将采用一个函数或它知道该函数的字符串. nunique绝对是这些字符串之一.

Implies that there is a name nunique in the name space that performs some function. transform will take a function or a string that it knows a function for. nunique is definitely one of those strings.

@root指出,pandas用来执行由这些字符串指示的转换的方法通常是经过优化的,并且通常应首选使用传递您自己的函数的方法.在某些情况下,即使通过numpy函数也要使用True.

As pointed out by @root, often the method that pandas will utilize to perform a transformation indicated by these strings are optimized and should generally be preferred to passing your own functions. This is True even for passing numpy functions in some cases.

例如,transform('sum')transform(sum)更可取.

试试看

df.groupby(['track', 'type'])['id'].transform('nunique')

演示

demo

df = pd.DataFrame(dict(
    track=list('11112222'), type=list('AAAABBBB'), id=list('XXYZWWWW')))
print(df)

  id track type
0  X     1    A
1  X     1    A
2  Y     1    A
3  Z     1    A
4  W     2    B
5  W     2    B
6  W     2    B
7  W     2    B

df.groupby(['track', 'type'])['id'].transform('nunique')

0    3
1    3
2    3
3    3
4    1
5    1
6    1
7    1
Name: id, dtype: int64

这篇关于将分组的聚合nunique列添加到pandas数据框的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆