根据R中的唯一三元组汇总数据 [英] Aggregating data based on unique triads in R

查看:77
本文介绍了根据R中的唯一三元组汇总数据的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在这里被引荐计算R中的现有排列 对于先前的相关问题,但我无法将其应用于我的问题.这是我的数据

I was referred here Counting existing permutations in R for previous related question but I can't apply it to my problem. Here is the data I have

One <- c(rep("X",6),rep("Y",3),rep("Z",2))
Two <- c(rep("A",4),rep("B",6),rep("C",1))
Three <- c(rep("J",5),rep("K",2),rep("L",4))
Number <- runif(11)


df <- data.frame(One,Two,Three,Number)


   One Two Three     Number
1    X   A     J 0.10511669
2    X   A     J 0.62467760
3    X   A     J 0.24232663
4    X   A     J 0.38358854
5    X   B     J 0.04658226
6    X   B     K 0.26789844
7    Y   B     K 0.07685341
8    Y   B     L 0.21372276
9    Y   B     L 0.13620971
10   Z   B     L 0.49073692
11   Z   C     L 0.52968279

我尝试过

aggregate(df, df[,c(1:3)],FUN = c(length,mean))

收到

Error in match.fun(FUN) : 
'c(length, mean)' is not a function, character or symbol

我正在尝试通过创建一个新的数据框来进行汇总,该数据框为我提供每个唯一三合会(一个,两个,三个)的频率以及另一列,其中包含每个唯一三合会的中位数Number.因此,对于(X,A,J)三元组,我希望Count = 4,Median为Number下前四个数字的中位数.

I am trying to aggregate by creating a new data frame that gives me the frequency of each unique triad (One, Two, Three) and another column that contains the median of Number for each unique triad. So for the (X,A,J) triad, I want Count = 4 and Median to be the median of the first four numbers under Number.

推荐答案

您可以使用dplyr

 library(dplyr)
 res <- df%>%
 group_by(One,Two,Three) %>%
 summarize(length=n(), Mean=mean(Number)) #change `mean` to `median` if you want `median`

 str(res)
#Classes ‘grouped_df’, ‘tbl_df’, ‘tbl’ and 'data.frame':    7 obs. of  5 variables:
 ----------
  str(as.data.frame(res))
#'data.frame':  7 obs. of  5 variables:
# $ One   : Factor w/ 3 levels "X","Y","Z": 1 1 1 2 2 3 3
# $ Two   : Factor w/ 3 levels "A","B","C": 1 2 2 2 2 2 3
# $ Three : Factor w/ 3 levels "J","K","L": 1 1 2 2 3 3 3
# $ length: int  4 1 1 1 2 1 1
# $ Mean  : num  0.689 0.989 0.524 0.181 0.345 ...

library(data.table)
setDT(df)[,list(length=.N, Mean=mean(Number)),by=list(One,Two,Three)]
#      One Two Three length      Mean
# 1:   X   A     J      4 0.3660189
# 2:   X   B     J      1 0.8389641
# 3:   X   B     K      1 0.2815004
# 4:   Y   B     K      1 0.4990414
# 5:   Y   B     L      2 0.3814621
# 6:   Z   B     L      1 0.1144003
# 7:   Z   C     L      1 0.9508751

这篇关于根据R中的唯一三元组汇总数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆