根据R中的频率表计算分组方差 [英] Calculating grouped variance from a frequency table in R

查看:378
本文介绍了根据R中的频率表计算分组方差的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我如何在R中从如下所示的数据集中计算总体方差和每个组的方差:

How can I, in R calculate the overall variance and the variance for each group from a dataset that looks like this (for example):

Group Count Value
A      3     5
A      2     8
B      1     11
B      3     15

我知道要整体计算方差,而忽略我要进行的分组: var(rep(x$Value, x$Count)), 但是,如何自动计算每个组的频率方差?例如,A组,B组等的方差,..我希望我的输出具有以下标头:

I know to calculate the variance as a whole, ignoring the groups I would do: var(rep(x$Value, x$Count)), but how do I automatically calculate the variance for each group accounting for the frequency? E.g., the variance for group A, group B, etc.,.. I would like my output to have the following headers:

Group, Total Count, Group Variance 

我也查看了此链接; R计算平均值,中位数,具有频率分布的文件方差这是不同的(没有组组件),因此这不是重复项.

I have also reviewed this link; R computing mean, median, variance from file with frequency distribution which is different (does not have the group component) so this is not a duplicate.

感谢您的所有帮助.

推荐答案

一种选择是使用data.table.将data.frame转换为data.table(setDT),并通过组"获得值"的var和计数"的sum.

One option is using data.table. Convert the data.frame to data.table (setDT) and get the var of "Value" and sum of "Count" by "Group".

library(data.table)
setDT(df1)[, list(GroupVariance=var(rep(Value, Count)),
                      TotalCount=sum(Count)) , by = Group]
#    Group GroupVariance TotalCount
#1:     A           2.7          5
#2:     B           4.0          4

使用dplyr的类似方法是

library(dplyr)
group_by(df1, Group) %>% 
      summarise(GroupVariance=var(rep(Value,Count)), TotalCount=sum(Count))
#     Group GroupVariance TotalCount
#1     A           2.7          5
#2     B           4.0          4

这篇关于根据R中的频率表计算分组方差的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆