根据组中唯一/不同值的数量创建二进制变量 [英] Create binary variable based on number of unique / distinct values by group

查看:41
本文介绍了根据组中唯一/不同值的数量创建二进制变量的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的数据如下:

userID  <- c(1,1,1,2,2,2,3,3,3)
product <- c("a","a","a","b","b","c","a","b","c")
df <- data.frame(userID, product)

对于每个'userID',我想创建一个二进制指示符变量,如果有一个以上唯一产品,则为1;如果所有产品相同,则为0.

For each 'userID', I want to create a binary indicator variable which is 1 if there are more than one unique product, and 0 if all products are the same.

所以我的填充矢量看起来像:

so my filled vector would look like:

df$result <- c(0,0,0,1,1,1,1,1,1)
#    userID product result
# 1      1       a      0
# 2      1       a      0
# 3      1       a      0
# 4      2       b      1
# 5      2       b      1
# 6      2       c      1
# 7      3       a      1
# 8      3       b      1
# 9      3       c      1

例如用户1仅具有一个不同的产品('a')->结果=0.用户2具有一个以上的唯一乘积("b"和"c").结果= 1.

E.g. user 1 has only one distinct product ('a') -> result = 0. User 2 has more than one unique product ('b' and 'c') -> result = 1.

推荐答案

您可以使用 base R

 df$result <- with(df, ave(as.character(product), userID, 
                 FUN=function(x) length(unique(x)))>1) +0 
 df$result
 [1] 0 0 0 1 1 1 1 1 1

或者按照@David Arenburg的建议,您可以使用 transform 并在 df

Or as suggested by @David Arenburg, you could use transform and create a new variable result within the df

  transform(df, result = (ave(as.character(product), 
          userID, FUN = function(x) length(unique(x)))>1)+0)

tbl <- rowSums(!!table(df[,-3]))>1
(df$userID %in% names(tbl)[tbl])+0
 #[1] 0 0 0 1 1 1 1 1 1

这篇关于根据组中唯一/不同值的数量创建二进制变量的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆