R,dplyr:n_distinct的累积版本 [英] R, dplyr: cumulative version of n_distinct

查看:94
本文介绍了R,dplyr:n_distinct的累积版本的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个如下数据框。它按时间列进行排序。

I have a dataframe as follows. It is ordered by column time.

输入-

df = data.frame(time = 1:20,
            grp = sort(rep(1:5,4)),
            var1 = rep(c('A','B'),10)
            )

head(df,10)
   time grp var1
1   1   1    A
2   2   1    B
3   3   1    A
4   4   1    B
5   5   2    A
6   6   2    B
7   7   2    A
8   8   2    B
9   9   3    A
10 10   3    B

我想创建另一个变量 var2 到目前为止的不同 var1 值,即直到每个时间 的点为止c $ c> grp 。这与我使用 n_distinct 会得到的有点不同。

I want to create another variable var2 which computes no of distinct var1 values so far i.e. until that point in time for each group grp . This is a little different from what I'd get if I were to use n_distinct.

预期的输出-

   time grp var1 var2
1   1   1    A    1
2   2   1    B    2
3   3   1    A    2
4   4   1    B    2
5   5   2    A    1
6   6   2    B    2
7   7   2    A    2
8   8   2    B    2
9   9   3    A    1
10 10   3    B    2

我要创建为此,说一个 cum_n_distinct 并将其用作-

I want to create a function say cum_n_distinct for this and use it as -

d_out = df %>%
  arrange(time) %>%
  group_by(grp) %>%
  mutate(var2 = cum_n_distinct(var1))


推荐答案

假定物料在时间已经,首先定义一个累积的不同函数:

Assuming stuff is ordered by time already, first define a cumulative distinct function:

dist_cum <- function(var)
  sapply(seq_along(var), function(x) length(unique(head(var, x))))

然后使用 ave 创建组的基本解决方案(请注意,假定 var1 是因素),然后将函数应用于每个组:

Then a base solution that uses ave to create groups (note, assumes var1 is factor), and then applies our function to each group:

transform(df, var2=ave(as.integer(var1), grp, FUN=dist_cum))

A data.table 解决方案,基本上会做同样的事情:

A data.table solution, basically doing the same thing:

library(data.table)
(data.table(df)[, var2:=dist_cum(var1), by=grp])

dplyr ,再次是同一件事:

library(dplyr)
df %>% group_by(grp) %>% mutate(var2=dist_cum(var1))

这篇关于R,dplyr:n_distinct的累积版本的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆