计算数据帧子集中的比例 [英] Calculate proportions within subsets of a data frame
问题描述
我正在尝试获取数据帧子集中的比例.例如,在这个伪造的数据框中:
I am trying to obtain proportions within subsets of a data frame. For example, in this made-up data frame:
DF<-data.frame(category1=rep(c("A","B"),each=9),
category2=rep(rep(LETTERS[24:26],each=3),2),
animal=rep(c("dog","cat","mouse"),6),number=sample(18))
我想通过category2
组合来计算每只category1
的三只动物的比例(例如,在所有既是"A"又是"X"的动物中,狗是什么比例? ).使用数据框第4列上的prop.table
,我可以得到每一行占总数字"列的比例,但是我还没有找到针对基于类别1和2的子集执行此操作的方法.尝试使用以下方法按category1
和category2
拆分数据:
I would like like to calculate the proportion of each of the three animals for each category1
by category2
combination (e.g., out of all animals that are both "A" and "X", what proportion are dogs?). With prop.table
on column 4 of the data frame I can get the proportion that each row makes up of the total "number" column, but I have not found a way to do this for subsets based on category 1 and 2. I also tried splitting the data by category1
and category2
using this:
splitDF<-split(DF,list(DF$category1,DF$category2))
我希望然后可以用prop.table
应用一个函数来获取每个拆分组中每只动物的比例,但是我无法使prop.table
正常工作,因为我似乎无法指定要在哪一列数据中进行操作.将功能应用于拆分组中.有人有提示吗?也许可以使用plyr
或类似的东西吗?我在帮助论坛中找不到有关在 数据子集中获取比例的方法的任何信息.
And I was hoping I could then apply a function with prop.table
to get the proportions of each animal within each split group, but I cannot get prop.table
working because I can't seem to specify which column of data to apply the function to within the split groups. Does anyone have any tips? Maybe this is possible with plyr
or something similar? I can't find anything in the help forums about ways to get proportions within subsets of data.
推荐答案
您可以使用库plyr
中的函数ddply()
计算每种组合的比例,然后将新列添加到数据框中.
You can use function ddply()
from library plyr
to calculate proportions for each combination and then add new column to data frame.
library(plyr)
DF<-ddply(DF,.(category1,category2),transform,prop=number/sum(number))
DF
category1 category2 animal number prop
1 A X dog 17 0.44736842
2 A X cat 3 0.07894737
3 A X mouse 18 0.47368421
4 A Y dog 2 0.14285714
这篇关于计算数据帧子集中的比例的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!