dplyr:按组查找每个bin的平均值 [英] dplyr: Find mean for each bin by groups

查看:151
本文介绍了dplyr:按组查找每个bin的平均值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在努力了解dplyr。我按照组,框和符号在数据框中分割值,我正在尝试为每个组/ bin / sign组合获取一个平均值。我想输出每个组/ bin / sign组合的这些计数的数据帧,以及每个组的总数。我认为我有它,但有时我得到不同的值在基地R相比ddplyr的输出。我正在做这个吗?它也很扭曲...有更直接的方式吗?谢谢!

 库(ggplot2)
df< - data.frame(
id = sample (LETTERS [1:3],100,replace = TRUE),
tobin = rnorm(1000),
value = rnorm(1000)

df $ tobin [sample (nf(df),10)] = 0

df $ bin = cut_interval(abs(df $ tobin),length = 1)
df $ sign = ifelse(df $ tobin = = 0,NULL,ifelse(df $ tobin> 0, - ,+))


#使用dplyr查找按组,bin和sign的值的平均值
库(dplyr)
res < - df%>%group_by(id,bin,sign)%>%
summaryize(Num = length(bin),value = value,na.rm = TRUE))

res%>%group_by(id)%>%
总结(total = sum(Num))
res = .frame(res)
total = data.frame(total)
res $ total = total [match(res $ id,total $ id),total]

res [res $ id ==A& res $ bin ==[0,1]& res $ sign ==NULL,]

#检入基数R如果按组,bin和符号表示是否正确#有时不是?
groupA = df [df $ id ==A& df $ bin ==[0,1]& df $ sign ==NULL,]
表示(groupA $ value,na.rm = T)

我很疯狂,因为它对我的数据不起作用,这个命令只是重复整个数据集的意思:

 code> ddply(df,。(id,bin,sign),summarize,mean = mean(value,na.rm = TRUE))

其中mean等于mean(value,na.rm = TRUE),完全忽略分组...所有组都是因子,数值是数字...



这样做:

  with(df,aggregate(df $ value,by = list(id,bin,sign),FUN = function(x)c(mean(x)))

请帮助我..

解决方案

你似乎在fl。一下。你有正确的代码,那么你有额外的代码。



从新的R会话开始并定义数据,然后

  library(dplyr)
res< - df%>%group_by(id,bin,sign)%>%
总结(Num = n(),value = mean(value,na.rm = TRUE))

上面的代码来自你的问题,但是我用内置的 dplyr :: n() length(bin) >功能。上面的代码准确地给出了分组平均值:

  head(res)
#id bin sign Num value
#1 A [0,1] - 122 -0.08330338
#2 A [0,1] + 111 0.11394381
#3 A [0,1] NULL 2 0.75232462
# 4 A(1,2] - 54 -0.09236725
#5 A(1,2)+ 45 0.20581095
#6 A(2,3] - 12 -0.08998771

向代码块中跳过最后几条线:

  groupA = df [df $ id ==A& df $ bin ==[0,1]& df $ sign ==NULL,] 
# mean(groupA $ value,na.rm = T)
#[1] 0.7523246

其中匹配上述结果的第三行,所以你这样做,它的工作正常!



其余的代码很困惑:

  res%>%group_by(id)%>%
总结(total = sum(Num))

我不知道你想要完成什么,但你不屁股



至于您的 ddply 尝试:

  ddply(df,。(id,bin,sign),summarize,mean = mean(value,na.rm = TRUE))

你会注意到,如果你有 dplyr 加载然后加载 plyr 库,有一条消息:


您已加载dlyr之后,这很可能会导致问题。
如果您需要plyr和dplyr的函数,请先加载plyr,然后dplyr:
library(plyr);图书馆(dplyr)


不要忽略此警告!我猜这是发生了,你忽略它,这是你的烦恼的一部分。可能您根本不需要 plyr ,但如果您这样做,请在之前加载 dplyr


I am trying to understand dplyr. I am splitting values in my data frame by group, bins and by sign, and I am trying to get a mean value for each group/bin/sign combination. I would like to output a data frame with these counts per each group/bin/sign combination, and the total numbers per each group. I think I have it but sometimes I get different values in base R compared to the output of ddplyr. Am I doing this correctly? It is also very contorted...is there a more direct way? Thank you!

library(ggplot2)
df <-  data.frame(
id = sample(LETTERS[1:3], 100, replace=TRUE),
tobin = rnorm(1000),
value = rnorm(1000)
)
df$tobin[sample(nrow(df), 10)]=0

df$bin = cut_interval(abs(df$tobin), length=1)
df$sign = ifelse(df$tobin==0, "NULL", ifelse(df$tobin>0, "-", "+"))


# Find mean of value by group, bin, and sign using dplyr
library(dplyr)
res <- df %>% group_by(id, bin, sign) %>%
        summarise(Num = length(bin), value=mean(value,na.rm=TRUE))

        res %>% group_by(id) %>%
                summarise(total= sum(Num))
            res=data.frame(res)
            total=data.frame(total)
            res$total = total[match(res$id, total$id),"total"]            

res[res$id=="A" & res$bin=="[0,1]" & res$sign=="NULL",]

# Check in base R if mean by group, bin, and sign is correct # Sometimes not?
groupA = df[df$id=="A" & df$bin=="[0,1]" & df$sign=="NULL",]
mean(groupA$value, na.rm=T)

I am going crazy because it doesn't work on my data, and this command just repeats the mean of the whole dataset:

ddply(df, .(id, bin, sign), summarize, mean = mean(value,na.rm=TRUE))

Where mean is equal to mean(value,na.rm=TRUE), completely ignoring the grouping...All the groups are factors, and the value is numeric...

This however works:

with(df, aggregate(df$value, by = list(id, bin, sign), FUN = function(x) c(mean(x))))

Please help me..

解决方案

You seem to be flailing a bit. You've got correct code, then you've got extra code.

Starting from a fresh R session and defining your data, then

library(dplyr)
res <- df %>% group_by(id, bin, sign) %>%
        summarise(Num = n(), value = mean(value,na.rm=TRUE))

The above code is from your question, but I replaced length(bin) with the built-in dplyr::n() function. The above code accurately gives the group-wise averages:

head(res)
#   id   bin sign Num       value
# 1  A [0,1]    - 122 -0.08330338
# 2  A [0,1]    + 111  0.11394381
# 3  A [0,1] NULL   2  0.75232462
# 4  A (1,2]    -  54 -0.09236725
# 5  A (1,2]    +  45  0.20581095
# 6  A (2,3]    -  12 -0.08998771

Jumping ahead to your last couple lines in the code block:

groupA = df[df$id=="A" & df$bin=="[0, 1]" & df$sign=="NULL", ]
# mean(groupA$value, na.rm=T)
# [1] 0.7523246

Which matches the 3rd line of the above results. So you did it, it works fine!

The rest of your code is confused:

res %>% group_by(id) %>%
                summarise(total= sum(Num))

I'm not sure what you're trying to accomplish with this, but you don't assign it to anything so it is run but not saved.

As for your ddply attempt:

ddply(df, .(id, bin, sign), summarize, mean = mean(value,na.rm=TRUE))

You'll notice that if you have dplyr loaded and then load the plyr library, there's a message that:

You have loaded plyr after dplyr - this is likely to cause problems. If you need functions from both plyr and dplyr, please load plyr first, then dplyr: library(plyr); library(dplyr)

Do not ignore this warning! My guess is this happened, you ignored it, and that's part of the source of your troubles. Probably you don't need plyr at all, but if you do, load it before dplyr!

这篇关于dplyr:按组查找每个bin的平均值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆