R根据一列中的值汇总列中的唯一值 [英] R summarize unique values across columns based on values from one column

查看:333
本文介绍了R根据一列中的值汇总列中的唯一值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想根据var_1的值知道每一列的唯一值的总数.

I want to know the total number of unique values for each column based on the values of var_1.

例如:

Test <- data.frame(var_1 = c("a","a","a", "b", "b", "c", "c", "c", "c", "c"), var_2 = c("bl","bf","bl", "bl","bf","bl","bl","bf","bc", "bg" ), var_3 = c("cf","cf","eg", "cf","cf","eg","cf","dr","eg","fg"))

我正在寻找的结果将基于var_1中的值,并且应该是:

The results I am looking for would be based on the values in var_1 and should be:

var_1 var_2 var_3
a     2     2
b     2     1
c     3     4

但是,在尝试了各种方法(包括apply和table)之后-聚合一直是我要寻找的最接近的东西,但是此脚本会得出var_1的每个值的条目总数的摘要,但是总数不是唯一

However, after trying various methods (including apply and table) - aggregate has been the closest thing to what I am looking for, but this script results in a summary of the total number of entries for each value of var_1, but the total is not unique

agbyv1= aggregate(. ~ var_1, Test, length) 

var_1 var_2 var_3
a     3     3
b     2     2
c     5     5

我尝试过

unqbyv1= aggregate(. ~ var_1, Test, length(unique(x)))

但这没用.

非常感谢您的帮助.

推荐答案

尝试

library(dplyr)
Test %>%
      group_by(var_1) %>% 
      summarise_each(funs(n_distinct(.)))

library(data.table)#v1.9.5+
setDT(Test)[, lapply(.SD, uniqueN), var_1]

如果有NAs

setDT(Test)[, lapply(.SD, function(x) uniqueN(na.omit(x))), var_1]

或者您可以使用aggregate.默认情况下,na.action=na.omit.因此,我们不需要任何修改.

Or you can use aggregate. By default, the na.action=na.omit. So, we don't need any modifications.

aggregate(.~ var_1, Test, FUN=function(x) length(unique(x)) )

这篇关于R根据一列中的值汇总列中的唯一值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆